Research	R0040 — RLHF Alternatives
Run	2026-03-29
Query	Q001 — RLHF Alternatives

Q001 — RLHF Alternatives — ACH Matrix¶

Matrix¶

Evidence	H1	H2	H3
SRC01-E01 — RLHF drives sycophancy	+	-	+
SRC01-E02 — Universal sycophancy	+	--	+
SRC02-E01 — DPO eliminates reward model	++	--	+
SRC02-E02 — DPO OOD limitations	+	+	++
SRC03-E01 — CAI uses principles	++	--	+
SRC04-E01 — RLAIF matches RLHF	++	--	+
SRC05-E01 — RLHF fundamental limits	++	--	++
SRC06-E01 — GRPO halves compute	++	--	+
SRC07-E01 — KTO binary signals	++	--	+
SRC08-E01 — Industry shift	+	--	+

Legend¶

Symbol	Meaning
++	Strongly consistent
+	Consistent
--	Strongly inconsistent
-	Inconsistent
N/A	Not applicable

Diagnosticity Analysis¶

Most diagnostic evidence:

SRC02-E02 (DPO OOD limitations) — This is the only evidence that is consistent with H2 while also supporting H3. It discriminates between "full replacement" (H1 strong form) and "toolkit augmentation" (H3).
SRC05-E01 (RLHF fundamental limits) — Strongly supports both H1 and H3 while strongly contradicting H2, establishing that the search for alternatives is driven by real, documented problems.

Least diagnostic evidence:

SRC08-E01 (Industry shift) — Consistent with H1 and H3 and inconsistent with H2, but as industry commentary it adds limited independent discrimination.

Outcome¶

H1 is the best-supported hypothesis. It is consistent or strongly consistent with all 10 evidence items. H2 is effectively eliminated, being strongly inconsistent with 8 of 10 evidence items. H3 is well-supported as a complementary framework to H1, adding nuance about the coexistence of methods rather than clean replacement.