SRC05¶

KTO paper applying prospect theory to LLM alignment.

Source¶

Field	Value
Title	KTO: Model Alignment as Prospect Theoretic Optimization
Publisher	ICML 2024
Author(s)	Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, Douwe Kiela
Date	2024-02-02
URL	https://arxiv.org/abs/2402.01306
Type	Research paper (peer-reviewed)

Dimension	Rationale
Reliability	Peer-reviewed at ICML 2024. Authors include Jurafsky (leading NLP researcher) and Kiela (well-known ML researcher). Provides both theoretical framework and empirical validation at scale (1B-30B).
Relevance	Introduces a theoretically novel alternative grounded in behavioral economics rather than RL. Also provides the "human-aware losses" framework that contextualizes DPO and related methods.
Bias flags	No significant concerns. Authors from academic and industry research without clear commercial conflict.

Evidence ID	Summary
SRC05-E01	KTO uses binary feedback signals and prospect theory, matching DPO with simpler data requirements