R0057/2026-04-01/C006/H1¶
Statement¶
All six alternatives exist and qualify as major
Status¶
Current: Supported
Supporting Evidence¶
| Evidence | Summary |
|---|---|
| SRC01-E01 | All six named alternatives (DPO, KTO, GRPO, Constitutional AI, ORPO, RLVR) are documented in the literature |
Contradicting Evidence¶
| Evidence | Summary |
|---|---|
| — | No contradicting evidence found |
Reasoning¶
All six alternatives are documented across multiple technical surveys. DPO eliminates the reward model. KTO uses binary feedback. GRPO uses group-relative advantages. Constitutional AI uses principle-based feedback. ORPO combines SFT and preference optimization. RLVR uses programmatic verifiers.
Relationship to Other Hypotheses¶
H1 represents full accuracy. H2 allows for partial correctness. H3 is eliminated by the evidence.