Skip to content

R0055/2026-04-01/C007 — Claim Definition

Claim as Received

Six major alternatives to RLHF have emerged since 2022 (DPO, Constitutional AI, GRPO, KTO, ORPO, RLVR)

Claim as Clarified

Six major alternatives to RLHF have emerged since 2022 (DPO, Constitutional AI, GRPO, KTO, ORPO, RLVR)

BLUF

Substantially correct. All six named methods exist and have emerged as alternatives or complements to standard RLHF. DPO (2023), Constitutional AI (2022), GRPO (2025), KTO (2024), ORPO (2024), and RLVR (2024-2025) are all documented. Whether they are all 'major' is debatable — some like DPO and GRPO are widely adopted while others like KTO and ORPO have narrower use.

Scope

  • Domain: AI alignment, sycophancy, enterprise AI
  • Timeframe: 2022-2026
  • Testability: Verifiable against published research and documentation

Assessment Summary

Probability: Very likely (80-95%)

Confidence: High

Hypothesis outcome: H2 prevails — see assessment for details.

[Full assessment in assessment.md.]

Status

Field Value
Date created 2026-04-01
Date completed 2026-04-01
Researcher profile Phillip Moore
Prompt version Unified Research Methodology v1
Revisit by 2026-10-01
Revisit trigger New methods displacing any of these six; changes in adoption patterns