R0057/2026-04-01/C006/H1¶


Research	R0057 — RLHF Yes-Men Claims v3
Run	2026-04-01
Claim	C006
Hypothesis	H1

Statement¶

All six alternatives exist and qualify as major

Status¶

Current: Supported

Supporting Evidence¶

Evidence	Summary
SRC01-E01	All six named alternatives (DPO, KTO, GRPO, Constitutional AI, ORPO, RLVR) are documented in the literature

Contradicting Evidence¶

Evidence	Summary
—	No contradicting evidence found

Reasoning¶

All six alternatives are documented across multiple technical surveys. DPO eliminates the reward model. KTO uses binary feedback. GRPO uses group-relative advantages. Constitutional AI uses principle-based feedback. ORPO combines SFT and preference optimization. RLVR uses programmatic verifiers.

Relationship to Other Hypotheses¶

H1 represents full accuracy. H2 allows for partial correctness. H3 is eliminated by the evidence.