R0043/2026-03-28¶
Three queries investigated the cross-domain vocabulary for the phenomenon AI safety researchers call "sycophancy," the regulatory requirements that address it, and whether the vocabulary gap has been recognized in the literature.
Queries¶
Q001 — Cross-Domain Vocabulary Map — Partial vocabulary with systematic gaps
Query: What terms do different industries and disciplines use to describe AI behavior that prioritizes user agreement, comfort, or satisfaction over accuracy, correctness, or safety?
Answer: The vocabulary is systematically asymmetric. Regulated industries have mature terminology for the human side (automation bias, complacency, overtrust) while AI safety alone has terminology for the system side (sycophancy). This divide reflects when each domain's vocabulary was developed: traditional automation era vs. adaptive AI era.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Rich cross-domain vocabulary | Partially supported | — |
| H2: No cross-domain vocabulary | Eliminated | Remote (< 5%) |
| H3: Partial with systematic gaps | Supported | Very likely (80-95%) |
Sources: 10 | Searches: 6
Q002 — Enterprise Requirements — Indirect requirements only
Query: Search for enterprise requirements, procurement specifications, regulatory guidance, or deployment standards that address the sycophancy phenomenon under its domain-specific names.
Answer: Requirements exist but exclusively through indirect means: human oversight mandates (EU AI Act), automation bias awareness (FDA), general trustworthiness criteria (DoD), and voluntary risk frameworks (NIST). No regulation directly constrains sycophantic system behavior. The vocabulary gap produces a requirements gap.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Substantial direct requirements | Eliminated | Remote (< 5%) |
| H2: No requirements exist | Eliminated | Remote (< 5%) |
| H3: Indirect requirements only | Supported | Very likely (80-95%) |
Sources: 6 | Searches: 3
Q003 — Vocabulary Gap Literature — Recognized broadly but not for sycophancy
Query: Has the vocabulary gap itself been identified as a problem in the AI safety or AI governance literature?
Answer: The broader AI terminology gap is well-recognized, with multiple active bridging efforts (MIT AI Risk Repository, AIR 2024, Standardized Threat Taxonomy). However, the specific sycophancy/overreliance vocabulary gap has not been articulated as a distinct problem. Sycophancy is absent from every major bridging taxonomy examined.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Gap recognized and addressed | Partially supported | — |
| H2: Gap not recognized | Eliminated | Remote (< 5%) |
| H3: Recognized but not for sycophancy | Supported | Likely (55-80%) |
Sources: 5 | Searches: 2
Collection Analysis¶
Cross-Cutting Patterns¶
| Pattern | Queries Affected | Significance |
|---|---|---|
| Human-side/system-side vocabulary divide | Q001, Q002 | The central finding: regulated industries frame the problem as human cognitive failure (automation bias, overreliance); AI safety frames it as system behavioral failure (sycophancy). This framing difference drives both vocabulary gaps and requirements gaps. |
| Vocabulary determines requirements | Q001, Q002 | Regulators who use human-side terms write human-side requirements. The EU AI Act's choice of "automation bias" (not "sycophancy") produced a deployer-awareness obligation (not a system-design constraint). |
| Bridging efforts miss sycophancy | Q001, Q003 | Every major taxonomy bridging effort examined (MIT, AIR 2024, Standardized Threat Taxonomy) omits sycophancy as a distinct category. The phenomenon falls between technical-threat taxonomies and human-factors vocabularies. |
| Traditional automation vs. adaptive AI divide | Q001, Q003 | Existing vocabulary was developed for deterministic automation (autopilots, CDSS). AI that actively adapts output to please users is qualitatively different, and vocabulary has not caught up. |
Collection Statistics¶
| Metric | Value |
|---|---|
| Queries investigated | 3 |
| All answered with nuanced/conditional hypothesis (H3) | 3 (Q001, Q002, Q003) |
| All H2 (negative) hypotheses eliminated | 3 |
| All H1 (affirmative) hypotheses partially supported or eliminated | 3 |
Source Independence Assessment¶
The 21 sources across all three queries are highly independent. They span: - Jurisdictions: EU (AI Act), U.S. (NIST, FDA, DoD, FAA), cross-jurisdictional (AIR 2024) - Disciplines: AI safety, human factors engineering, healthcare informatics, defense policy, law, philosophy - Source types: Legislation, government standards, peer-reviewed journals, policy research, preprints - Time range: 2010 (Parasuraman & Manzey foundational model) through 2026 (FDA CDS guidance update)
No evidence of citation clustering: the sources do not predominantly cite each other. The AI safety sources (Anthropic, DeepMind, OpenAI) are independent from the regulated-industry sources (EU, DoD, FAA, FDA). The convergence on the human-side/system-side finding across independent sources strengthens confidence.
Collection Gaps¶
| Gap | Impact | Mitigation |
|---|---|---|
| Proprietary procurement documents | May contain more specific anti-sycophancy requirements than public regulations | Gap acknowledged; public requirements serve as lower bound |
| Non-English regulatory terminology | EU member states, Asian regulators may use different terms | Gap acknowledged; focus on English-language literature is a limitation |
| Classified/FOUO defense documents | DoD testing criteria may address sycophancy more specifically | Gap acknowledged; CaTE publications suggest calibrated trust focus |
| ISO/IEC 42001 full standard text | Full standard may contain sycophancy-relevant provisions | Gap acknowledged; publicly available summaries suggest no specific provision |
| Professional society standardization | IEEE, ACM, HL7, ARINC working groups may have terminology efforts | Gap acknowledged; web search may not surface these |
Collection Self-Audit¶
| Domain | Rating | Notes |
|---|---|---|
| Eligibility criteria | Low risk | Consistent criteria across all 3 queries: published terminology, formal requirements, and bridging efforts |
| Search comprehensiveness | Some concerns | 11 searches, 110+ results dispositioned. Financial services and enterprise evaluation searches were thinnest. Professional society efforts may be under-covered. |
| Evaluation consistency | Low risk | Same scoring framework (GRADE + Cochrane-adapted bias domains) applied across all 21 sources |
| Synthesis fairness | Low risk | H3 (nuanced) prevailed in all 3 queries, which is consistent with the evidence but could reflect a predisposition toward "it depends" answers. Mitigated by the fact that H1 and H2 received fair hearing with evidence explicitly mapped to each. |
Resources¶
Summary¶
| Metric | Value |
|---|---|
| Queries investigated | 3 |
| Files produced | 96 |
| Sources scored | 21 |
| Evidence extracts | 21 |
| Results dispositioned | 16 selected + 94 rejected = 110 total |
| Duration (wall clock) | 23m 43s |
| Tool uses (total) | 89 |
Tool Breakdown¶
| Tool | Uses | Purpose |
|---|---|---|
| WebSearch | 15 | Search queries across domains |
| WebFetch | 8 | Page content retrieval for key sources |
| Write | 48 | File creation |
| Read | 4 | Methodology and format document reading |
| Edit | 0 | No file modifications |
| Bash | 12 | Directory creation, batch file generation |
Token Distribution¶
| Category | Tokens |
|---|---|
| Input (context) | ~180,000 |
| Output (generation) | ~45,000 |
| Total | ~225,000 |