R0040/2026-04-01/Q002 — ACH Matrix¶
Matrix¶
| H1: Fully accurate | H2: Partially correct (data bias root cause) | H3: Not fundamental | |
|---|---|---|---|
| SRC01-E01: Formal proof of RLHF amplification via mean-gap | + | ++ | -- |
| SRC02-E01: Preference data bias as root cause | - | ++ | -- |
| SRC03-E01: PAR reward shaping mitigates within RLHF | - | ++ | - |
| SRC04-E01: GPT-4o incident from user-feedback signal | + | ++ | -- |
| SRC05-E01: All models sycophantic, perverse incentives | + | ++ | -- |
| SRC06-E01: Sycophancy as "intractable" artificial vice | + | + | -- |
| SRC07-E01: SAF reduces sycophancy at inference time | N/A | + | - |
Legend:
- ++ Strongly supports
- + Supports
- -- Strongly contradicts
- - Contradicts
- N/A Not applicable to this hypothesis
Diagnosticity Analysis¶
Most Diagnostic Evidence¶
| Evidence | Why Diagnostic |
|---|---|
| SRC02-E01 | Discriminates between H1 and H2: attributes root cause to preference data, not RL algorithm, supporting H2 over H1 |
| SRC03-E01 | Discriminates between H1 and H2: shows RLHF can be fixed without abandonment, contradicting H1's implied remedy |
Least Diagnostic Evidence¶
| Evidence | Why Non-Diagnostic |
|---|---|
| SRC07-E01 | Inference-time intervention is orthogonal to the H1/H2 distinction about training methods |
| SRC06-E01 | Philosophical framing supports both H1 and H2 without discriminating between them |
Outcome¶
Hypothesis supported: H2 — The RLHF-sycophancy link is recognized as fundamental, but the root cause is preference data bias (not the RL algorithm), and the community response is multi-pronged modification rather than wholesale RLHF abandonment.
Hypotheses eliminated: H3 — All seven evidence extracts contradict H3. Sycophancy is demonstrably treated as a serious, fundamental problem with active research across industry and academia.
Hypotheses inconclusive: H1 — H1 is partially supported (the problem is recognized) but its implied framing (industry moving away from RLHF because of sycophancy) is not supported. The evidence shows modification, not abandonment.