Skip to content

S05 — KTO, ORPO, SimPO, and Other DPO Variants

Summary

Source / Database Web (Google via WebSearch) + arXiv
Query terms "KTO kahneman tversky optimization RLHF alternative 2024 2025"; "ORPO odds ratio preference optimization SimPO simple preference optimization 2024 2025"; "SPIN self-play fine-tuning IPO identity preference optimization RLHF alternatives"
Filters None
Results returned 30 (10 per query)
Results selected 5
Results rejected 25

Selected Results

Result Title URL Rationale
S05-R01 KTO: Model Alignment as Prospect Theoretic Optimization (arXiv) https://arxiv.org/abs/2402.01306 Primary KTO paper
S05-R02 ORPO: Monolithic Preference Optimization (arXiv) https://arxiv.org/abs/2403.07691 Primary ORPO paper
S05-R03 RLHF and alternatives: Overview (Argilla) https://argilla.io/blog/mantisnlp-rlhf-part-9/ Comprehensive overview of all variants
S05-R04 DPO Isn't Enough: The Modern Post-Training Stack (Medium) https://medium.com/@fahey_james/dpo-isnt-enough-the-modern-post-training-stack-simpo-orpo-kto-and-beyond-d82e52a1ee6c Current stack analysis
S05-R05 Self-Play Preference Optimization (SPPO) https://uclaml.github.io/SPPO/ Self-play approach

Rejected Results

Result Title URL Rationale
S05-R06-30 Various secondary sources Various Tutorials, duplicate coverage, or narrow application papers

Notes

This search covered the "long tail" of RLHF alternatives that emerged from the DPO lineage. Each addresses a specific limitation: KTO (data requirements), ORPO (reference model dependence), IPO (overfitting), SimPO (simplicity), SPIN (self-play).