R0043/2026-03-28¶


Research	R0043 — Sycophancy Vocabulary
Mode	Query
Run date	2026-03-28
Queries	3
Prompt	Unified Research Standard v1.0-draft
Model	Claude Opus 4.6

Three queries investigated the cross-domain vocabulary for the phenomenon AI safety researchers call "sycophancy," the regulatory requirements that address it, and whether the vocabulary gap has been recognized in the literature.

Queries¶

Q001 — Cross-Domain Vocabulary Map — Partial vocabulary with systematic gaps

Query: What terms do different industries and disciplines use to describe AI behavior that prioritizes user agreement, comfort, or satisfaction over accuracy, correctness, or safety?

Answer: The vocabulary is systematically asymmetric. Regulated industries have mature terminology for the human side (automation bias, complacency, overtrust) while AI safety alone has terminology for the system side (sycophancy). This divide reflects when each domain's vocabulary was developed: traditional automation era vs. adaptive AI era.

Hypothesis	Status	Probability
H1: Rich cross-domain vocabulary	Partially supported	—
H2: No cross-domain vocabulary	Eliminated	Remote (< 5%)
H3: Partial with systematic gaps	Supported	Very likely (80-95%)

Sources: 10 | Searches: 6

Full analysis

Q002 — Enterprise Requirements — Indirect requirements only

Query: Search for enterprise requirements, procurement specifications, regulatory guidance, or deployment standards that address the sycophancy phenomenon under its domain-specific names.

Answer: Requirements exist but exclusively through indirect means: human oversight mandates (EU AI Act), automation bias awareness (FDA), general trustworthiness criteria (DoD), and voluntary risk frameworks (NIST). No regulation directly constrains sycophantic system behavior. The vocabulary gap produces a requirements gap.

Hypothesis	Status	Probability
H1: Substantial direct requirements	Eliminated	Remote (< 5%)
H2: No requirements exist	Eliminated	Remote (< 5%)
H3: Indirect requirements only	Supported	Very likely (80-95%)

Sources: 6 | Searches: 3

Full analysis

Q003 — Vocabulary Gap Literature — Recognized broadly but not for sycophancy

Query: Has the vocabulary gap itself been identified as a problem in the AI safety or AI governance literature?

Answer: The broader AI terminology gap is well-recognized, with multiple active bridging efforts (MIT AI Risk Repository, AIR 2024, Standardized Threat Taxonomy). However, the specific sycophancy/overreliance vocabulary gap has not been articulated as a distinct problem. Sycophancy is absent from every major bridging taxonomy examined.

Hypothesis	Status	Probability
H1: Gap recognized and addressed	Partially supported	—
H2: Gap not recognized	Eliminated	Remote (< 5%)
H3: Recognized but not for sycophancy	Supported	Likely (55-80%)

Sources: 5 | Searches: 2

Full analysis

Collection Analysis¶

Cross-Cutting Patterns¶

Pattern	Queries Affected	Significance
Human-side/system-side vocabulary divide	Q001, Q002	The central finding: regulated industries frame the problem as human cognitive failure (automation bias, overreliance); AI safety frames it as system behavioral failure (sycophancy). This framing difference drives both vocabulary gaps and requirements gaps.
Vocabulary determines requirements	Q001, Q002	Regulators who use human-side terms write human-side requirements. The EU AI Act's choice of "automation bias" (not "sycophancy") produced a deployer-awareness obligation (not a system-design constraint).
Bridging efforts miss sycophancy	Q001, Q003	Every major taxonomy bridging effort examined (MIT, AIR 2024, Standardized Threat Taxonomy) omits sycophancy as a distinct category. The phenomenon falls between technical-threat taxonomies and human-factors vocabularies.
Traditional automation vs. adaptive AI divide	Q001, Q003	Existing vocabulary was developed for deterministic automation (autopilots, CDSS). AI that actively adapts output to please users is qualitatively different, and vocabulary has not caught up.

Collection Statistics¶

Metric	Value
Queries investigated	3
All answered with nuanced/conditional hypothesis (H3)	3 (Q001, Q002, Q003)
All H2 (negative) hypotheses eliminated	3
All H1 (affirmative) hypotheses partially supported or eliminated	3

Source Independence Assessment¶

The 21 sources across all three queries are highly independent. They span: - Jurisdictions: EU (AI Act), U.S. (NIST, FDA, DoD, FAA), cross-jurisdictional (AIR 2024) - Disciplines: AI safety, human factors engineering, healthcare informatics, defense policy, law, philosophy - Source types: Legislation, government standards, peer-reviewed journals, policy research, preprints - Time range: 2010 (Parasuraman & Manzey foundational model) through 2026 (FDA CDS guidance update)

No evidence of citation clustering: the sources do not predominantly cite each other. The AI safety sources (Anthropic, DeepMind, OpenAI) are independent from the regulated-industry sources (EU, DoD, FAA, FDA). The convergence on the human-side/system-side finding across independent sources strengthens confidence.

Collection Gaps¶

Gap	Impact	Mitigation
Proprietary procurement documents	May contain more specific anti-sycophancy requirements than public regulations	Gap acknowledged; public requirements serve as lower bound
Non-English regulatory terminology	EU member states, Asian regulators may use different terms	Gap acknowledged; focus on English-language literature is a limitation
Classified/FOUO defense documents	DoD testing criteria may address sycophancy more specifically	Gap acknowledged; CaTE publications suggest calibrated trust focus
ISO/IEC 42001 full standard text	Full standard may contain sycophancy-relevant provisions	Gap acknowledged; publicly available summaries suggest no specific provision
Professional society standardization	IEEE, ACM, HL7, ARINC working groups may have terminology efforts	Gap acknowledged; web search may not surface these

Collection Self-Audit¶

Domain	Rating	Notes
Eligibility criteria	Low risk	Consistent criteria across all 3 queries: published terminology, formal requirements, and bridging efforts
Search comprehensiveness	Some concerns	11 searches, 110+ results dispositioned. Financial services and enterprise evaluation searches were thinnest. Professional society efforts may be under-covered.
Evaluation consistency	Low risk	Same scoring framework (GRADE + Cochrane-adapted bias domains) applied across all 21 sources
Synthesis fairness	Low risk	H3 (nuanced) prevailed in all 3 queries, which is consistent with the evidence but could reflect a predisposition toward "it depends" answers. Mitigated by the fact that H1 and H2 received fair hearing with evidence explicitly mapped to each.

Resources¶

Summary¶

Metric	Value
Queries investigated	3
Files produced	96
Sources scored	21
Evidence extracts	21
Results dispositioned	16 selected + 94 rejected = 110 total
Duration (wall clock)	23m 43s
Tool uses (total)	89

Tool Breakdown¶

Tool	Uses	Purpose
WebSearch	15	Search queries across domains
WebFetch	8	Page content retrieval for key sources
Write	48	File creation
Read	4	Methodology and format document reading
Edit	0	No file modifications
Bash	12	Directory creation, batch file generation

Token Distribution¶

Category	Tokens
Input (context)	~180,000
Output (generation)	~45,000
Total	~225,000