Skip to content

R0040/2026-04-01/Q001/S04

Research R0040 — RLHF Alternatives
Run 2026-04-01
Query Q001
Search S04

WebSearch — KTO, ORPO, IPO, and SPIN preference optimization methods

Summary

Field Value
Source/Database WebSearch
Query terms KTO Kahneman-Tversky optimization ORPO IPO preference alignment LLM; SPIN self-play fine-tuning OAIF online AI feedback SteerLM preference optimization methods LLM 2025
Filters None
Results returned 20 (two searches combined)
Results selected 4
Results rejected 16

Selected Results

Result Title URL Rationale
S04-R01 KTO: Model Alignment as Prospect Theoretic Optimization https://arxiv.org/abs/2402.01306 Original KTO paper -- primary source
S04-R02 Better, Cheaper, Faster LLM Alignment with KTO https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/ KTO authors' technical blog with practical details
S04-R03 Preference Tuning LLMs with DPO Methods https://huggingface.co/blog/pref-tuning HuggingFace overview of IPO, KTO, and DPO variants
S04-R04 Self-Play Fine-Tuning Converts Weak to Strong LLMs https://arxiv.org/abs/2401.01335 Original SPIN paper -- primary source

Rejected Results

Result Title URL Rationale
S04-R05 KTO PDF https://arxiv.org/pdf/2402.01306 Same paper as R01, PDF format
S04-R06 Model Alignment Process (Prem AI) https://blog.premai.io/model-alignment-process/ General overview, less rigorous
S04-R07 KTO Medium article https://medium.com/@SpielmitDaten/kahneman-tversky-optimization-kto-revolutionizing-language-model-training-with-prospect-theory-99f30c50481e Popular explainer, duplicates content
S04-R08 Stanford CS224N final project https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1244/final-projects/SooWeiKoh.pdf Student project, lower authority
S04-R09 RLHF and alternatives: KTO (Argilla) https://argilla.io/blog/mantisnlp-rlhf-part-7/ Data platform blog, commercial context
S04-R10 Vinija's Notes on LLM Alignment https://vinija.ai/concepts/llm-alignment/ Personal notes, not peer-reviewed
S04-R11 Evolution of Preference Optimization Techniques https://hippocampus-garden.com/preference_optimization/ Blog overview, duplicates content
S04-R12 SPIN verl documentation https://verl.readthedocs.io/en/latest/algo/spin.html Implementation docs, not analytical
S04-R13 SPIN GitHub https://github.com/uclaml/SPIN Code repository, not analytical
S04-R14 SPIN project page https://uclaml.github.io/SPIN/ Project page, duplicates arxiv content
S04-R15 SPIN ACM proceedings https://dl.acm.org/doi/10.5555/3692070.3692326 Same paper as R04, different venue
S04-R16 ExSPIN https://www.mdpi.com/1099-4300/27/3/235 Domain-specific extension, not core
S04-R17 SPIN arxiv HTML https://arxiv.org/html/2401.01335v1 Same paper as R04, HTML format
S04-R18 SPIN HuggingFace paper page https://huggingface.co/papers/2401.01335 Same paper as R04, aggregator
S04-R19 SPIN Emergent Mind https://www.emergentmind.com/papers/2401.01335 Same paper, aggregator
S04-R20 LLM-self-play GitHub https://github.com/thomasgauthier/LLM-self-play Implementation, not analytical

Notes

KTO stands out as the most methodologically novel approach, grounding alignment in prospect theory rather than preference maximization. SPIN represents a different paradigm (self-play) that reduces external data dependence.