R0040/2026-03-28/Q001/SRC05
KTO paper applying prospect theory to LLM alignment.
Source
| Field |
Value |
| Title |
KTO: Model Alignment as Prospect Theoretic Optimization |
| Publisher |
ICML 2024 |
| Author(s) |
Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, Douwe Kiela |
| Date |
2024-02-02 |
| URL |
https://arxiv.org/abs/2402.01306 |
| Type |
Research paper (peer-reviewed) |
Summary
| Dimension |
Rating |
| Reliability |
High |
| Relevance |
High |
| Bias: Missing data |
Low risk |
| Bias: Measurement |
Low risk |
| Bias: Selective reporting |
Low risk |
| Bias: Randomization |
N/A |
| Bias: Protocol deviation |
N/A |
| Bias: COI/Funding |
Low risk |
Rationale
| Dimension |
Rationale |
| Reliability |
Peer-reviewed at ICML 2024. Authors include Jurafsky (leading NLP researcher) and Kiela (well-known ML researcher). Provides both theoretical framework and empirical validation at scale (1B-30B). |
| Relevance |
Introduces a theoretically novel alternative grounded in behavioral economics rather than RL. Also provides the "human-aware losses" framework that contextualizes DPO and related methods. |
| Bias flags |
No significant concerns. Authors from academic and industry research without clear commercial conflict. |
| Evidence ID |
Summary |
| SRC05-E01 |
KTO uses binary feedback signals and prospect theory, matching DPO with simpler data requirements |