R0055/2026-04-01/C007
Claim: Six major alternatives to RLHF have emerged since 2022 (DPO, Constitutional AI, GRPO, KTO, ORPO, RLVR)
BLUF: Substantially correct. All six named methods exist and have emerged as alternatives or complements to standard RLHF. DPO (2023), Constitutional AI (2022), GRPO (2025), KTO (2024), ORPO (2024), and RLVR (2024-2025) are all documented. Whether they are all 'major' is debatable — some like DPO and GRPO are widely adopted while others like KTO and ORPO have narrower use.
Probability: Very likely (80-95%) | Confidence: High
Summary
Hypotheses
| ID |
Hypothesis |
Status |
| H1 |
Claim is accurate as stated |
Inconclusive |
| H2 |
Claim is partially correct or correct with caveats |
Supported |
| H3 |
Claim is materially wrong |
Eliminated |
Searches
| ID |
Target |
Results |
Selected |
| S01 |
RLHF alternatives DPO Constitutional AI GRPO KTO O |
10 |
2 |
Sources
| Source |
Description |
Reliability |
Relevance |
| SRC01 |
Post-training survey 2026 |
Medium |
High |
Revisit Triggers
- New methods displacing any of these six; changes in adoption patterns