R0044/2026-03-29¶
This run investigates whether the expanded vocabulary from human-factors and AI safety research reveals regulatory requirements, documented harms, vocabulary bridges, and institutional knowledge about AI systems that reinforce user assumptions rather than challenging them. The four queries systematically explore the regulatory landscape, consequence evidence, vocabulary gap, and the most sophisticated institutional actor (DoD CaTE).
Queries¶
Q001 — Regulatory requirements constraining AI system behavior — H3: Indirect/nascent
Query: Using the expanded vocabulary, search for enterprise or government requirements that constrain AI system behavior — not just human operator behavior — to prevent the system from reinforcing user assumptions or providing agreeable-but-incorrect output. Focus on defense, healthcare, aviation, and financial services.
Answer: System-side requirements exist across defense, aviation, and general AI governance (NIST, EU AI Act), but they address system design (transparency, oversight enablement) rather than system output content (preventing agreeable-but-incorrect responses). No regulation explicitly prohibits AI sycophancy. Financial services regulation remains entirely human-focused.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: System-side requirements exist | Partially supported | Likely (55-80%) |
| H2: No system-side requirements | Eliminated | — |
| H3: Indirect/nascent requirements | Supported | Very likely (80-95%) |
Sources: 6 | Searches: 5
Q002 — Consequences of agreeable AI in professional contexts — H3: Primarily automation bias
Query: Search for research on the consequences of AI systems that agree with users rather than challenge them, specifically in high-stakes professional contexts. Look for case studies, incident reports, or empirical studies where agreeable AI output led to measurable harm or near-misses.
Answer: Documented consequences exist across consumer and professional contexts, but with a critical asymmetry: system-side sycophancy harm is primarily documented in consumer/laboratory settings (OpenAI incident, Science study), while professional-context harm comes predominantly from automation bias (human over-reliance) rather than AI designed to agree. The distinction is narrowing as professional tools adopt RLHF optimization.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Documented harm exists | Partially supported | Very likely (80-95%) |
| H2: No documented harm | Eliminated | — |
| H3: Primarily automation bias, not sycophancy | Supported | Likely (55-80%) |
Sources: 6 | Searches: 3
Q003 — Bridging automation bias and sycophancy vocabularies — H3: Partial/emerging
Query: Has anyone in the regulated industries published research that explicitly connects the human-factors concept of "automation bias" to the AI safety concept of "sycophancy"? Is anyone bridging these two vocabularies?
Answer: Bridging is emerging but not yet systematic. Georgetown CSET's "AI Safety and Automation Bias" paper (November 2024) is the strongest candidate. However, the most comprehensive automation bias systematic review (2025, 35 studies) does not mention sycophancy, and the most sophisticated sycophancy analysis connects to confirmation bias, not automation bias. No publication was found that formally maps the two vocabularies as descriptions of the same underlying phenomenon.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Explicit bridging exists | Partially supported | Unlikely (20-45%) |
| H2: No bridging exists | Eliminated | — |
| H3: Partial/emerging bridging | Supported | Likely (55-80%) |
Sources: 4 | Searches: 3
Q004 — CaTE publications and system-side scope — H3: System properties, not output behavior
Query: What has the DoD CaTE center published about calibrating trust in AI systems, and does their work address the system-side behavior (AI adjusting output to match user expectations) or only the human-side behavior (users trusting AI too much)?
Answer: CaTE has published a Guidebook and companion guides focused on trust measurement and trustworthiness evaluation. CaTE addresses system design properties (trustworthiness dimensions) and human trust calibration, but does NOT address system-side output behavior. The concept of sycophancy is absent from CaTE's vocabulary. CaTE operates on a "measure and inform" paradigm, not a "constrain and prevent" paradigm.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Both system-side and human-side | Eliminated | — |
| H2: Only human-side | Partially supported | — |
| H3: System properties, not output behavior | Supported | Almost certain (95-99%) |
Sources: 3 | Searches: 3
Collection Analysis¶
Cross-Cutting Patterns¶
| Pattern | Queries Affected | Significance |
|---|---|---|
| Design vs. output gap | Q001, Q004 | All regulatory frameworks and institutional approaches address system design properties (transparency, explainability) but not system output behavior (preventing agreeableness/sycophancy). This is the central finding of the run. |
| Vocabulary siloing | Q001, Q003 | Human factors vocabulary (automation bias, overtrust, calibrated trust) and AI safety vocabulary (sycophancy, RLHF alignment) remain largely separate, creating blind spots in both regulation and research. |
| Human-side paradigm dominance | Q001, Q002, Q004 | The dominant regulatory and research paradigm frames the problem as human cognitive vulnerability to be managed, not as system behavior to be constrained. Even the most sophisticated institution (CaTE) operates within this paradigm. |
| Automation bias vs. sycophancy convergence | Q002 | Professional-context harm is currently from automation bias (human over-reliance), but as AI tools adopt RLHF optimization, the distinction between automation bias and sycophancy will blur. The OpenAI incident previews this convergence. |
Collection Statistics¶
| Metric | Value |
|---|---|
| Queries investigated | 4 |
| H3 (nuanced) supported | 4 (Q001, Q002, Q003, Q004) |
| H2 (negative) eliminated | 4 (Q001, Q002, Q003, Q004) |
| H1 (affirmative) partially supported | 3 (Q001, Q002, Q003) |
| H1 eliminated | 1 (Q004) |
Source Independence Assessment¶
Sources span a wide range of institutional types: government standards bodies (NIST, FAA, EU Parliament), regulatory agencies (FINRA), military research centers (CaTE/SEI, Sandia), academic publishers (Science, JAMA, ISQ, Springer), policy research centers (Georgetown CSET, ICRC), and technology companies (OpenAI). No single upstream source dominates the evidence base. The convergence on the "design vs. output" gap is independently confirmed across all institutional types.
Collection Gaps¶
| Gap | Impact | Mitigation |
|---|---|---|
| PDF extraction failures (NIST AI 600-1, CaTE Guidebook, CSET paper, OMB M-26-04) | May miss specific system behavioral requirements or vocabulary bridging within these documents | Human reviewer should obtain and read full texts |
| Engineering-specific evidence absent | Q002 found no engineering case studies of agreeable AI harm | Search in engineering-specific databases (IEEE, ASME) |
| Financial services case studies absent | No documented financial losses from AI confirmation reinforcement | Search financial incident databases (SEC enforcement, FINRA arbitrations) |
| Classified military evidence | Most consequential military AI over-reliance incidents may be classified | Accept as structural limitation |
Collection Self-Audit¶
| Domain | Rating | Notes |
|---|---|---|
| Eligibility criteria | Low risk | Criteria were well-defined by the queries and applied consistently |
| Search comprehensiveness | Some concerns | PDF extraction failures reduced evidence depth for key sources. 14 searches across 4 queries covered the target space. |
| Evaluation consistency | Low risk | Same scoring framework applied across all 19 sources |
| Synthesis fairness | Low risk | All hypotheses received fair hearing; the consistent H3 outcome reflects the evidence pattern, not a methodological bias |
Resources¶
Summary¶
| Metric | Value |
|---|---|
| Queries investigated | 4 |
| Files produced | ~80 |
| Sources scored | 19 |
| Evidence extracts | 19 |
| Results dispositioned | ~50 selected + ~50 rejected = ~100 total |
| Duration (wall clock) | 26m 13s |
| Tool uses (total) | 153 |
Tool Breakdown¶
| Tool | Uses | Purpose |
|---|---|---|
| WebSearch | 16 | Search queries across all sectors and topics |
| WebFetch | 12 | Page content retrieval for key sources |
| Write | ~80 | File creation for all output files |
| Read | 4 | Methodology and output format document reading |
| Bash | 2 | Directory creation |
Token Distribution¶
| Category | Tokens |
|---|---|
| Input (context) | ~450,000 |
| Output (generation) | ~120,000 |
| Total | ~570,000 |