R0041/2026-04-01/Q003 — ACH Matrix¶
Matrix¶
| H1: RLVR broadly eliminates sycophancy | H2: Partial applicability | H3: No meaningful sycophancy impact | |
|---|---|---|---|
| SRC01-E01: RLVR eliminates reward model vector in verifiable domains | + | ++ | - |
| SRC01-E02: Three failure modes and sampler debate | -- | + | + |
| SRC02-E01: Reward hacking resistance in verifiable domains | + | + | - |
| SRC03-E01: Cannot apply to open-ended tasks; degrades diversity | -- | ++ | + |
| SRC04-E01: DeepSeek V3 most sycophantic despite RLVR | -- | + | ++ |
Legend:
++Strongly supports+Supports--Strongly contradicts-ContradictsN/ANot applicable to this hypothesis
Diagnosticity Analysis¶
Most Diagnostic Evidence¶
| Evidence | Why Diagnostic |
|---|---|
| SRC04-E01 | DeepSeek V3 being the most sycophantic model despite RLVR training is the single most diagnostic piece of evidence, strongly discriminating H1 from H2 and H3 |
| SRC03-E01 | The fundamental limitation that RLVR "cannot be directly applied to open-ended tasks" discriminates H1 from H2 |
Least Diagnostic Evidence¶
| Evidence | Why Non-Diagnostic |
|---|---|
| SRC02-E01 | Domain list and reward hacking resistance confirm known capabilities without discriminating between hypotheses |
Outcome¶
Hypothesis supported: H2 — RLVR provides partial sycophancy reduction in verifiable domains but cannot address the broader problem
Hypotheses eliminated: H1 — The open-ended task limitation and DeepSeek V3 sycophancy finding eliminate broad applicability
Hypotheses inconclusive: H3 — RLVR does eliminate one sycophancy vector (reward model) in its applicable domains, preventing full confirmation of H3; but the DeepSeek evidence supports it