S05 — KTO, ORPO, SimPO, and Other DPO Variants
Summary
|
|
| Source / Database |
Web (Google via WebSearch) + arXiv |
| Query terms |
"KTO kahneman tversky optimization RLHF alternative 2024 2025"; "ORPO odds ratio preference optimization SimPO simple preference optimization 2024 2025"; "SPIN self-play fine-tuning IPO identity preference optimization RLHF alternatives" |
| Filters |
None |
| Results returned |
30 (10 per query) |
| Results selected |
5 |
| Results rejected |
25 |
Selected Results
Rejected Results
| Result |
Title |
URL |
Rationale |
| S05-R06-30 |
Various secondary sources |
Various |
Tutorials, duplicate coverage, or narrow application papers |
Notes
This search covered the "long tail" of RLHF alternatives that emerged from the DPO lineage. Each addresses a specific limitation: KTO (data requirements), ORPO (reference model dependence), IPO (overfitting), SimPO (simplicity), SPIN (self-play).