Skip to content

R0044/2026-03-29

Research R0044 — Expanded Vocabulary Research
Mode Query
Run date 2026-03-29
Queries 4
Prompt Unified Research Standard v1.0-draft
Model Claude Opus 4.6

This run investigates whether the expanded vocabulary from human-factors and AI safety research reveals regulatory requirements, documented harms, vocabulary bridges, and institutional knowledge about AI systems that reinforce user assumptions rather than challenging them. The four queries systematically explore the regulatory landscape, consequence evidence, vocabulary gap, and the most sophisticated institutional actor (DoD CaTE).

Queries

Q001 — Regulatory requirements constraining AI system behavior — H3: Indirect/nascent

Query: Using the expanded vocabulary, search for enterprise or government requirements that constrain AI system behavior — not just human operator behavior — to prevent the system from reinforcing user assumptions or providing agreeable-but-incorrect output. Focus on defense, healthcare, aviation, and financial services.

Answer: System-side requirements exist across defense, aviation, and general AI governance (NIST, EU AI Act), but they address system design (transparency, oversight enablement) rather than system output content (preventing agreeable-but-incorrect responses). No regulation explicitly prohibits AI sycophancy. Financial services regulation remains entirely human-focused.

Hypothesis Status Probability
H1: System-side requirements exist Partially supported Likely (55-80%)
H2: No system-side requirements Eliminated
H3: Indirect/nascent requirements Supported Very likely (80-95%)

Sources: 6 | Searches: 5

Full analysis

Q002 — Consequences of agreeable AI in professional contexts — H3: Primarily automation bias

Query: Search for research on the consequences of AI systems that agree with users rather than challenge them, specifically in high-stakes professional contexts. Look for case studies, incident reports, or empirical studies where agreeable AI output led to measurable harm or near-misses.

Answer: Documented consequences exist across consumer and professional contexts, but with a critical asymmetry: system-side sycophancy harm is primarily documented in consumer/laboratory settings (OpenAI incident, Science study), while professional-context harm comes predominantly from automation bias (human over-reliance) rather than AI designed to agree. The distinction is narrowing as professional tools adopt RLHF optimization.

Hypothesis Status Probability
H1: Documented harm exists Partially supported Very likely (80-95%)
H2: No documented harm Eliminated
H3: Primarily automation bias, not sycophancy Supported Likely (55-80%)

Sources: 6 | Searches: 3

Full analysis

Q003 — Bridging automation bias and sycophancy vocabularies — H3: Partial/emerging

Query: Has anyone in the regulated industries published research that explicitly connects the human-factors concept of "automation bias" to the AI safety concept of "sycophancy"? Is anyone bridging these two vocabularies?

Answer: Bridging is emerging but not yet systematic. Georgetown CSET's "AI Safety and Automation Bias" paper (November 2024) is the strongest candidate. However, the most comprehensive automation bias systematic review (2025, 35 studies) does not mention sycophancy, and the most sophisticated sycophancy analysis connects to confirmation bias, not automation bias. No publication was found that formally maps the two vocabularies as descriptions of the same underlying phenomenon.

Hypothesis Status Probability
H1: Explicit bridging exists Partially supported Unlikely (20-45%)
H2: No bridging exists Eliminated
H3: Partial/emerging bridging Supported Likely (55-80%)

Sources: 4 | Searches: 3

Full analysis

Q004 — CaTE publications and system-side scope — H3: System properties, not output behavior

Query: What has the DoD CaTE center published about calibrating trust in AI systems, and does their work address the system-side behavior (AI adjusting output to match user expectations) or only the human-side behavior (users trusting AI too much)?

Answer: CaTE has published a Guidebook and companion guides focused on trust measurement and trustworthiness evaluation. CaTE addresses system design properties (trustworthiness dimensions) and human trust calibration, but does NOT address system-side output behavior. The concept of sycophancy is absent from CaTE's vocabulary. CaTE operates on a "measure and inform" paradigm, not a "constrain and prevent" paradigm.

Hypothesis Status Probability
H1: Both system-side and human-side Eliminated
H2: Only human-side Partially supported
H3: System properties, not output behavior Supported Almost certain (95-99%)

Sources: 3 | Searches: 3

Full analysis


Collection Analysis

Cross-Cutting Patterns

Pattern Queries Affected Significance
Design vs. output gap Q001, Q004 All regulatory frameworks and institutional approaches address system design properties (transparency, explainability) but not system output behavior (preventing agreeableness/sycophancy). This is the central finding of the run.
Vocabulary siloing Q001, Q003 Human factors vocabulary (automation bias, overtrust, calibrated trust) and AI safety vocabulary (sycophancy, RLHF alignment) remain largely separate, creating blind spots in both regulation and research.
Human-side paradigm dominance Q001, Q002, Q004 The dominant regulatory and research paradigm frames the problem as human cognitive vulnerability to be managed, not as system behavior to be constrained. Even the most sophisticated institution (CaTE) operates within this paradigm.
Automation bias vs. sycophancy convergence Q002 Professional-context harm is currently from automation bias (human over-reliance), but as AI tools adopt RLHF optimization, the distinction between automation bias and sycophancy will blur. The OpenAI incident previews this convergence.

Collection Statistics

Metric Value
Queries investigated 4
H3 (nuanced) supported 4 (Q001, Q002, Q003, Q004)
H2 (negative) eliminated 4 (Q001, Q002, Q003, Q004)
H1 (affirmative) partially supported 3 (Q001, Q002, Q003)
H1 eliminated 1 (Q004)

Source Independence Assessment

Sources span a wide range of institutional types: government standards bodies (NIST, FAA, EU Parliament), regulatory agencies (FINRA), military research centers (CaTE/SEI, Sandia), academic publishers (Science, JAMA, ISQ, Springer), policy research centers (Georgetown CSET, ICRC), and technology companies (OpenAI). No single upstream source dominates the evidence base. The convergence on the "design vs. output" gap is independently confirmed across all institutional types.

Collection Gaps

Gap Impact Mitigation
PDF extraction failures (NIST AI 600-1, CaTE Guidebook, CSET paper, OMB M-26-04) May miss specific system behavioral requirements or vocabulary bridging within these documents Human reviewer should obtain and read full texts
Engineering-specific evidence absent Q002 found no engineering case studies of agreeable AI harm Search in engineering-specific databases (IEEE, ASME)
Financial services case studies absent No documented financial losses from AI confirmation reinforcement Search financial incident databases (SEC enforcement, FINRA arbitrations)
Classified military evidence Most consequential military AI over-reliance incidents may be classified Accept as structural limitation

Collection Self-Audit

Domain Rating Notes
Eligibility criteria Low risk Criteria were well-defined by the queries and applied consistently
Search comprehensiveness Some concerns PDF extraction failures reduced evidence depth for key sources. 14 searches across 4 queries covered the target space.
Evaluation consistency Low risk Same scoring framework applied across all 19 sources
Synthesis fairness Low risk All hypotheses received fair hearing; the consistent H3 outcome reflects the evidence pattern, not a methodological bias

Resources

Summary

Metric Value
Queries investigated 4
Files produced ~80
Sources scored 19
Evidence extracts 19
Results dispositioned ~50 selected + ~50 rejected = ~100 total
Duration (wall clock) 26m 13s
Tool uses (total) 153

Tool Breakdown

Tool Uses Purpose
WebSearch 16 Search queries across all sectors and topics
WebFetch 12 Page content retrieval for key sources
Write ~80 File creation for all output files
Read 4 Methodology and output format document reading
Bash 2 Directory creation

Token Distribution

Category Tokens
Input (context) ~450,000
Output (generation) ~120,000
Total ~570,000