Skip to content

R0040/2026-04-01/Q002 — ACH Matrix

Matrix

H1: Fully accurate H2: Partially correct (data bias root cause) H3: Not fundamental
SRC01-E01: Formal proof of RLHF amplification via mean-gap + ++ --
SRC02-E01: Preference data bias as root cause - ++ --
SRC03-E01: PAR reward shaping mitigates within RLHF - ++ -
SRC04-E01: GPT-4o incident from user-feedback signal + ++ --
SRC05-E01: All models sycophantic, perverse incentives + ++ --
SRC06-E01: Sycophancy as "intractable" artificial vice + + --
SRC07-E01: SAF reduces sycophancy at inference time N/A + -

Legend: - ++ Strongly supports - + Supports - -- Strongly contradicts - - Contradicts - N/A Not applicable to this hypothesis

Diagnosticity Analysis

Most Diagnostic Evidence

Evidence Why Diagnostic
SRC02-E01 Discriminates between H1 and H2: attributes root cause to preference data, not RL algorithm, supporting H2 over H1
SRC03-E01 Discriminates between H1 and H2: shows RLHF can be fixed without abandonment, contradicting H1's implied remedy

Least Diagnostic Evidence

Evidence Why Non-Diagnostic
SRC07-E01 Inference-time intervention is orthogonal to the H1/H2 distinction about training methods
SRC06-E01 Philosophical framing supports both H1 and H2 without discriminating between them

Outcome

Hypothesis supported: H2 — The RLHF-sycophancy link is recognized as fundamental, but the root cause is preference data bias (not the RL algorithm), and the community response is multi-pronged modification rather than wholesale RLHF abandonment.

Hypotheses eliminated: H3 — All seven evidence extracts contradict H3. Sycophancy is demonstrably treated as a serious, fundamental problem with active research across industry and academia.

Hypotheses inconclusive: H1 — H1 is partially supported (the problem is recognized) but its implied framing (industry moving away from RLHF because of sycophancy) is not supported. The evidence shows modification, not abandonment.