Q002 — Sycophancy Warnings — ACH Matrix¶


Research	R0048 — Corporate AI Training
Run	2026-03-29
Query	Q002 — Sycophancy Warnings

Matrix¶

Evidence	H1 — Training warns	H2 — Absent (too new)	H3 — Research-practice gap
SRC01-E01 — IPR calls it "hidden" flaw	--	+	++
SRC02-E01 — Firms won't self-regulate	--	N/A	++
SRC03-E01 — Published in Science	--	--	++
SRC04-E01 — GPT-4o rollback incident	--	+	++
SRC05-E01 — Practical mitigations in UX lit	--	-	++
SRC06-E01 — Bayesian analysis exists	--	--	++
SRC07-E01 — MS Research recommends but MS products don't implement	--	--	++
SRC08-E01 — NIST uses "confabulation" not sycophancy	-	+	+
SRC09-E01 — 40% zero scrutiny	-	N/A	+
SRC10-E01 — No regulation targets sycophancy	--	+	++

Legend¶

Symbol	Meaning
++	Strongly consistent with hypothesis
+	Consistent with hypothesis
-	Inconsistent with hypothesis
--	Strongly inconsistent with hypothesis
N/A	Not applicable

Diagnosticity Analysis¶

Most diagnostic evidence:

SRC03-E01 — Publication in Science eliminates H2 (concept too new/niche) while being strongly inconsistent with H1 (training warns) and strongly consistent with H3 (research-practice gap).
SRC07-E01 — Microsoft Research recommending overreliance training while Microsoft products do not implement it is the clearest evidence of the research-to-practice gap.

Least diagnostic evidence:

SRC08-E01 — NIST's treatment of confabulation is partially consistent with both H2 (the term is different) and H3 (the concept is adjacent but not translated).

Outcome¶

H1 is eliminated (0 consistent, 10 inconsistent). H2 is partially supported for the narrow claim that sycophancy is absent from training but contradicted for the "too new" explanation (3 strongly inconsistent). H3 is the best-supported hypothesis with 8 of 10 evidence items strongly consistent.