Q001 — ACH Matrix¶


Research	R0040 — RLHF Alternatives
Run	2026-03-28
Query	Q001

Matrix¶

	H1: Multiple viable alternatives	H2: No viable alternatives	H3: Modifications not replacements
SRC01-E01: Three alternatives in production use	++	--	+
SRC02-E01: DPO matches/exceeds RLHF at NeurIPS	++	--	+
SRC03-E01: CAI deployed for all Claude models since 2022	++	--	+
SRC04-E01: GRPO halves compute, deployed in DeepSeek-R1	++	--	N/A
SRC05-E01: KTO matches DPO with binary signals; HALO framework	+	--	++
SRC06-E01: CAI as "enhancement" to RLHF; human feedback as "moat"	+	-	++
SRC07-E01: ORPO eliminates reference model and alignment phase	++	--	-

Legend: - ++ Strongly supports - + Supports - -- Strongly contradicts - - Contradicts - N/A Not applicable to this hypothesis

Diagnosticity Analysis¶

Most Diagnostic Evidence¶

Evidence ID	Why Diagnostic
SRC05-E01	The HALO framework simultaneously supports H1 (KTO works) and H3 (all methods are a unified family). Most diagnostic because it discriminates between "revolution" and "evolution" readings.
SRC06-E01	The "enhancement not replacement" characterization and "competitive moat" observation directly inform the H1 vs H3 distinction.

Least Diagnostic Evidence¶

Evidence ID	Why Non-Diagnostic
SRC01-E01	Overview source that supports H1 and contradicts H2 but does not discriminate between H1 and H3
SRC04-E01	GRPO is clearly an alternative but its relationship to the H3 "modification" reading is ambiguous

Outcome¶

Hypothesis supported: H1 — Multiple viable alternatives exist and are in active use. The evidence unambiguously shows at least six alternatives deployed across multiple labs.

Hypotheses eliminated: H2 — No evidence supports H2. Every source documents at least one viable alternative.

Hypotheses inconclusive: H3 — Partially supported as a qualifier to H1. The HALO framework and DPO's mathematical derivation from RLHF support reading most alternatives as evolutionary. However, ORPO's structural simplification and GRPO+RLVR's elimination of subjective feedback demonstrate genuinely novel departures.