R0020/2026-03-25/Q002 — ACH Matrix¶
Matrix¶
| H1: Mainstream guides address sycophancy | H2: Not addressed in mainstream | H3: Emerging, inconsistent coverage | |
|---|---|---|---|
| SRC01-E01: Four causes, academic techniques | + | -- | ++ |
| SRC01-E02: Five critical research gaps | - | N/A | ++ |
| SRC02-E01: Question reframing (24pp reduction) | + | -- | ++ |
| SRC03-E01: NNG behavioral mitigations | + | -- | + |
| SRC04-E01: Industry strategies (~29% prompt contribution) | + | -- | + |
Legend:
- ++ Strongly supports
- + Supports
- -- Strongly contradicts
- - Contradicts
- N/A Not applicable to this hypothesis
Diagnosticity Analysis¶
Most Diagnostic Evidence¶
| Evidence ID | Why Diagnostic |
|---|---|
| SRC02-E01 | Question reframing outperforming direct instruction is uniquely diagnostic: it shows effective techniques exist (contradicts H2) but are academic, not mainstream (supports H3 over H1) |
| SRC01-E02 | Research gaps (measurement inconsistency, scalability) explain why mainstream guides can't yet provide reliable techniques (supports H3, weakens H1) |
Least Diagnostic Evidence¶
| Evidence ID | Why Non-Diagnostic |
|---|---|
| SRC04-E01 | Supports H1, H3 equally; unverifiable claims reduce discriminating power |
| SRC03-E01 | NNG coverage supports both H1 (mainstream awareness) and H3 (behavioral not technical) |
Outcome¶
Hypothesis supported: H3 — Sycophancy is increasingly discussed in mainstream contexts (post-GPT-4o incident), but coverage is inconsistent, often behavioral rather than technical, and the most effective prompt-level techniques remain in academic literature.
Hypotheses eliminated: H2 — Multiple mainstream sources discuss sycophancy.
Hypotheses inconclusive: H1 — Partially supported by awareness growth but undermined by the depth gap between academic and mainstream coverage.