S04¶


Research	R0040 — RLHF Alternatives
Run	2026-04-01
Query	Q001
Search	S04

WebSearch — KTO, ORPO, IPO, and SPIN preference optimization methods

Summary¶

Field	Value
Source/Database	WebSearch
Query terms	KTO Kahneman-Tversky optimization ORPO IPO preference alignment LLM; SPIN self-play fine-tuning OAIF online AI feedback SteerLM preference optimization methods LLM 2025
Filters	None
Results returned	20 (two searches combined)
Results selected	4
Results rejected	16

Selected Results¶

Result	Title	URL	Rationale
S04-R01	KTO: Model Alignment as Prospect Theoretic Optimization	https://arxiv.org/abs/2402.01306	Original KTO paper -- primary source
S04-R02	Better, Cheaper, Faster LLM Alignment with KTO	https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/	KTO authors' technical blog with practical details
S04-R03	Preference Tuning LLMs with DPO Methods	https://huggingface.co/blog/pref-tuning	HuggingFace overview of IPO, KTO, and DPO variants
S04-R04	Self-Play Fine-Tuning Converts Weak to Strong LLMs	https://arxiv.org/abs/2401.01335	Original SPIN paper -- primary source

Rejected Results¶

Result	Title	URL	Rationale
S04-R05	KTO PDF	https://arxiv.org/pdf/2402.01306	Same paper as R01, PDF format
S04-R06	Model Alignment Process (Prem AI)	https://blog.premai.io/model-alignment-process/	General overview, less rigorous
S04-R07	KTO Medium article	https://medium.com/@SpielmitDaten/kahneman-tversky-optimization-kto-revolutionizing-language-model-training-with-prospect-theory-99f30c50481e	Popular explainer, duplicates content
S04-R08	Stanford CS224N final project	https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1244/final-projects/SooWeiKoh.pdf	Student project, lower authority
S04-R09	RLHF and alternatives: KTO (Argilla)	https://argilla.io/blog/mantisnlp-rlhf-part-7/	Data platform blog, commercial context
S04-R10	Vinija's Notes on LLM Alignment	https://vinija.ai/concepts/llm-alignment/	Personal notes, not peer-reviewed
S04-R11	Evolution of Preference Optimization Techniques	https://hippocampus-garden.com/preference_optimization/	Blog overview, duplicates content
S04-R12	SPIN verl documentation	https://verl.readthedocs.io/en/latest/algo/spin.html	Implementation docs, not analytical
S04-R13	SPIN GitHub	https://github.com/uclaml/SPIN	Code repository, not analytical
S04-R14	SPIN project page	https://uclaml.github.io/SPIN/	Project page, duplicates arxiv content
S04-R15	SPIN ACM proceedings	https://dl.acm.org/doi/10.5555/3692070.3692326	Same paper as R04, different venue
S04-R16	ExSPIN	https://www.mdpi.com/1099-4300/27/3/235	Domain-specific extension, not core
S04-R17	SPIN arxiv HTML	https://arxiv.org/html/2401.01335v1	Same paper as R04, HTML format
S04-R18	SPIN HuggingFace paper page	https://huggingface.co/papers/2401.01335	Same paper as R04, aggregator
S04-R19	SPIN Emergent Mind	https://www.emergentmind.com/papers/2401.01335	Same paper, aggregator
S04-R20	LLM-self-play GitHub	https://github.com/thomasgauthier/LLM-self-play	Implementation, not analytical

Notes¶

KTO stands out as the most methodologically novel approach, grounding alignment in prospect theory rather than preference maximization. SPIN represents a different paradigm (self-play) that reduces external data dependence.