Q002 — ACH Matrix¶


Research	R0040 — RLHF Alternatives
Run	2026-04-01
Query	Q002

Matrix¶

	H1: Fully accurate	H2: Partially correct (data bias root cause)	H3: Not fundamental
SRC01-E01: Formal proof of RLHF amplification via mean-gap	+	++	--
SRC02-E01: Preference data bias as root cause	-	++	--
SRC03-E01: PAR reward shaping mitigates within RLHF	-	++	-
SRC04-E01: GPT-4o incident from user-feedback signal	+	++	--
SRC05-E01: All models sycophantic, perverse incentives	+	++	--
SRC06-E01: Sycophancy as "intractable" artificial vice	+	+	--
SRC07-E01: SAF reduces sycophancy at inference time	N/A	+	-

Legend: - ++ Strongly supports - + Supports - -- Strongly contradicts - - Contradicts - N/A Not applicable to this hypothesis

Diagnosticity Analysis¶

Most Diagnostic Evidence¶

Evidence	Why Diagnostic
SRC02-E01	Discriminates between H1 and H2: attributes root cause to preference data, not RL algorithm, supporting H2 over H1
SRC03-E01	Discriminates between H1 and H2: shows RLHF can be fixed without abandonment, contradicting H1's implied remedy

Least Diagnostic Evidence¶

Evidence	Why Non-Diagnostic
SRC07-E01	Inference-time intervention is orthogonal to the H1/H2 distinction about training methods
SRC06-E01	Philosophical framing supports both H1 and H2 without discriminating between them

Outcome¶

Hypothesis supported: H2 — The RLHF-sycophancy link is recognized as fundamental, but the root cause is preference data bias (not the RL algorithm), and the community response is multi-pronged modification rather than wholesale RLHF abandonment.

Hypotheses eliminated: H3 — All seven evidence extracts contradict H3. Sycophancy is demonstrably treated as a serious, fundamental problem with active research across industry and academia.

Hypotheses inconclusive: H1 — H1 is partially supported (the problem is recognized) but its implied framing (industry moving away from RLHF because of sycophancy) is not supported. The evidence shows modification, not abandonment.