R0048/2026-03-29¶
Q001 — AI Training Limitations
What do standard corporate AI training courses teach employees about AI limitations?
Training programs are widespread (82% of enterprises) and most mention AI limitations — typically hallucinations and the need to verify outputs. However, coverage is consistently superficial: 1-2 sentence warnings without explaining failure mechanisms or behavioral tendencies. Workers confirm this: more than half report training is inadequate.
| Hypothesis | Status |
|---|---|
| H1 — Training adequately covers limitations | Partially supported |
| H2 — Training does not cover limitations | Partially supported |
| H3 — Training mentions limitations superficially | Supported |
Confidence: Very likely (85%)
Q002 — Sycophancy Warnings
Do any training materials specifically warn about sycophancy or its equivalents?
No. No corporate or government AI training material examined warns about sycophancy by name or by any equivalent concept (automation bias, overtrust, confirmation reinforcement). This despite a 2026 Science publication, the OpenAI GPT-4o rollback incident, and multiple policy analyses. The gap is driven by research-to-practice lag, commercial disincentives, and regulatory absence.
| Hypothesis | Status |
|---|---|
| H1 — Training warns about sycophancy | Eliminated |
| H2 — Absent because too new | Partially supported |
| H3 — Research exists but has not reached training | Supported |
Confidence: Almost certain (97%)
Q003 — Hallucination Training
How do training materials characterize hallucination? Is it connected to sycophancy?
Training treats hallucination as a single, undifferentiated phenomenon. No training conveys the spectrum from random fabrication through subtle user-expectation-confirming outputs. No training connects hallucination to sycophancy. Research establishes that sycophantic AI produces "confirmatory evidence" through biased sampling — even "carefully-selected truths" can produce false beliefs without fabrication. This gap leaves employees blind to the harder-to-detect forms.
| Hypothesis | Status |
|---|---|
| H1 — Training presents fundamental property with spectrum | Eliminated |
| H2 — Training treats as occasional random errors | Partially supported |
| H3 — Training treats as undifferentiated, missing spectrum and sycophancy connection | Supported |
Confidence: Very likely (90%)
Collection Analysis¶
Cross-Cutting Patterns¶
-
The "Verify" Dead End: Across all three queries, the universal training advice is "verify AI outputs." This advice fails when AI outputs match user expectations (sycophancy), when verification requires domain expertise users may lack, or when the output is composed of true information selected to mislead (biased sampling).
-
The Research-Practice Chasm: All three queries reveal the same structural gap. Academic research understands the problems in depth (hallucination spectrum, sycophancy mechanism, automation bias). Training materials mention the problems in brief warnings. The transfer of knowledge from research to practice is failing.
-
Commercial Disincentive Alignment: Sycophantic AI drives engagement metrics. Hallucination framing as "technically solvable" supports product sales. Neither incentive structure favors deep user training about failure modes. Georgetown Law explicitly identifies this conflict.
-
Regulatory Vacuum: The EU AI Act mandates AI literacy but not specific topics. NIST addresses "confabulation" but not sycophancy. No regulation requires teaching about the hallucination-sycophancy connection or the detection-difficulty spectrum. Organizations following regulatory guidance alone will not address these risks.
-
The Confidence Paradox: Users prefer sycophantic AI, trust it more, and rate it higher quality (Stanford Science study). The 40% zero-scrutiny rate (Lumenova) shows automation bias operating unchecked. The combination means users are least critical of the outputs most likely to mislead them.
Collection Statistics¶
| Metric | Value |
|---|---|
| Total sources | 29 (11 + 10 + 8) |
| Unique sources | 25 (some shared across queries) |
| High-reliability sources | 10 |
| Peer-reviewed sources | 3 (Science, ACM TOIS, arXiv with experimental validation) |
| Government sources | 5 (GAO, GSA, NHS, UK GDS, NIST) |
| Total evidence extracts | 29 |
| Total searches | 9 (3 per query) |
Source Independence¶
Sources span seven independent categories: (1) academic peer-reviewed research, (2) government audits and frameworks, (3) commercial training product descriptions, (4) law firm policy templates, (5) industry surveys, (6) UX research organizations, and (7) technology journalism. No single source type dominates the collection, and findings converge across all categories.
Collection Gaps¶
| Gap | Impact | Queries Affected |
|---|---|---|
| Actual training module content (vs. descriptions) not examined | Moderate | Q001, Q003 |
| Proprietary internal training at tech companies not accessible | Low | Q001, Q002 |
| Non-English training materials not examined | Low-Moderate | All |
| Post-training knowledge assessments not available | Moderate | Q001 |
| Specialized AI safety bootcamps may address sycophancy | Low | Q002 |
Collection Self-Audit¶
The research methodology was consistent across all three queries: systematic search, source evaluation, hypothesis testing, and ACH analysis. The main limitation is the reliance on publicly available descriptions rather than actual training module content. However, the convergence of provider-side evidence (what training covers) with demand-side evidence (workers report inadequacy) strengthens confidence in the findings.
The researcher notes a potential framing bias: the questions are structured to find gaps in training, which predisposes toward finding them. This was mitigated by actively seeking evidence of comprehensive training (Deloitte, UK Playbook, Microsoft) and giving such evidence fair weight.
Resources¶
Summary¶
| Resource | Count |
|---|---|
| Web searches executed | 20 |
| Web pages fetched | 0 (search results only) |
| Total sources catalogued | 29 |
| Total evidence extracts | 29 |
| Total files produced | 92 |
| Duration (wall clock) | 35m 35s |
| Tool uses (total) | 134 |
Tool Breakdown¶
| Tool | Usage |
|---|---|
| WebSearch | 20 queries |
| Write | ~90 files |
| Bash | 2 (directory creation, verification) |
Token Distribution¶
| Phase | Approximate Share |
|---|---|
| Search and evidence gathering | 25% |
| Source evaluation and scoring | 15% |
| Hypothesis generation and testing | 15% |
| ACH matrix and assessment | 15% |
| File production | 30% |