R0040/2026-04-01/Q001/S04
WebSearch — KTO, ORPO, IPO, and SPIN preference optimization methods
Summary
| Field |
Value |
| Source/Database |
WebSearch |
| Query terms |
KTO Kahneman-Tversky optimization ORPO IPO preference alignment LLM; SPIN self-play fine-tuning OAIF online AI feedback SteerLM preference optimization methods LLM 2025 |
| Filters |
None |
| Results returned |
20 (two searches combined) |
| Results selected |
4 |
| Results rejected |
16 |
Selected Results
Rejected Results
| Result |
Title |
URL |
Rationale |
| S04-R05 |
KTO PDF |
https://arxiv.org/pdf/2402.01306 |
Same paper as R01, PDF format |
| S04-R06 |
Model Alignment Process (Prem AI) |
https://blog.premai.io/model-alignment-process/ |
General overview, less rigorous |
| S04-R07 |
KTO Medium article |
https://medium.com/@SpielmitDaten/kahneman-tversky-optimization-kto-revolutionizing-language-model-training-with-prospect-theory-99f30c50481e |
Popular explainer, duplicates content |
| S04-R08 |
Stanford CS224N final project |
https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1244/final-projects/SooWeiKoh.pdf |
Student project, lower authority |
| S04-R09 |
RLHF and alternatives: KTO (Argilla) |
https://argilla.io/blog/mantisnlp-rlhf-part-7/ |
Data platform blog, commercial context |
| S04-R10 |
Vinija's Notes on LLM Alignment |
https://vinija.ai/concepts/llm-alignment/ |
Personal notes, not peer-reviewed |
| S04-R11 |
Evolution of Preference Optimization Techniques |
https://hippocampus-garden.com/preference_optimization/ |
Blog overview, duplicates content |
| S04-R12 |
SPIN verl documentation |
https://verl.readthedocs.io/en/latest/algo/spin.html |
Implementation docs, not analytical |
| S04-R13 |
SPIN GitHub |
https://github.com/uclaml/SPIN |
Code repository, not analytical |
| S04-R14 |
SPIN project page |
https://uclaml.github.io/SPIN/ |
Project page, duplicates arxiv content |
| S04-R15 |
SPIN ACM proceedings |
https://dl.acm.org/doi/10.5555/3692070.3692326 |
Same paper as R04, different venue |
| S04-R16 |
ExSPIN |
https://www.mdpi.com/1099-4300/27/3/235 |
Domain-specific extension, not core |
| S04-R17 |
SPIN arxiv HTML |
https://arxiv.org/html/2401.01335v1 |
Same paper as R04, HTML format |
| S04-R18 |
SPIN HuggingFace paper page |
https://huggingface.co/papers/2401.01335 |
Same paper as R04, aggregator |
| S04-R19 |
SPIN Emergent Mind |
https://www.emergentmind.com/papers/2401.01335 |
Same paper, aggregator |
| S04-R20 |
LLM-self-play GitHub |
https://github.com/thomasgauthier/LLM-self-play |
Implementation, not analytical |
Notes
KTO stands out as the most methodologically novel approach, grounding alignment in prospect theory rather than preference maximization. SPIN represents a different paradigm (self-play) that reduces external data dependence.