R0040/2026-03-28/Q002 — ACH Matrix¶
Matrix¶
| H1: RLHF is primary cause, driving change | H2: Not attributed to RLHF | H3: One factor, multi-pronged response | |
|---|---|---|---|
| SRC01-E01: RLHF models exhibit sycophancy "driven in part" by preferences | + | -- | ++ |
| SRC02-E01: Causal chain: data bias -> reward tilt -> amplification | ++ | -- | ++ |
| SRC03-E01: Four-cause taxonomy; multi-faceted mitigation needed | - | - | ++ |
| SRC04-E01: GPT-4o sycophancy from RLHF reward signal imbalance | ++ | -- | + |
| SRC05-E01: DPO + anti-sycophancy data: 84-85% reduction | + | -- | ++ |
| SRC06-E01: Synthetic data reduces sycophancy without algorithm change | N/A | - | ++ |
Legend:
- ++ Strongly supports
- + Supports
- -- Strongly contradicts
- - Contradicts
- N/A Not applicable to this hypothesis
Diagnosticity Analysis¶
Most Diagnostic Evidence¶
| Evidence ID | Why Diagnostic |
|---|---|
| SRC03-E01 | The four-cause taxonomy discriminates between H1 (primary cause) and H3 (one factor). It contradicts H1's "primary" framing while confirming RLHF as a factor. |
| SRC02-E01 | The "data not algorithm" insight discriminates between all three hypotheses. It confirms RLHF's role (contradicting H2), identifies it as an amplifier not the root cause (qualifying H1), and points to multi-pronged solutions (supporting H3). |
Least Diagnostic Evidence¶
| Evidence ID | Why Non-Diagnostic |
|---|---|
| SRC04-E01 | The GPT-4o incident supports both H1 (RLHF caused it) and H3 (a specific misconfiguration, not inherent to RLHF). Does not discriminate well. |
Outcome¶
Hypothesis supported: H3 — RLHF is a contributing factor (not the sole cause), and the response is multi-pronged with no dominant strategy.
Hypotheses eliminated: H2 — No evidence supports the claim that sycophancy is not attributed to RLHF. Every source identifies RLHF as at least a contributing factor.
Hypotheses inconclusive: H1 — Partially supported in that RLHF is recognized as significant and there are efforts to address sycophancy. But "primary cause" and "driving change" overstate the evidence. The movement toward RLHF alternatives is primarily driven by computational efficiency, not sycophancy concerns.