Skip to content

SRC07 — KTO: Model Alignment as Prospect Theoretic Optimization

Source

Title KTO: Model Alignment as Prospect Theoretic Optimization
Publisher ICML 2024 / arXiv
Authors Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, Douwe Kiela
Date February 2024 (accepted ICML 2024)
URL https://arxiv.org/abs/2402.01306
Type Peer-reviewed conference paper

Summary Ratings

Dimension Rating
Reliability High
Relevance High
Missing data bias Low
Measurement bias Low
Selective reporting bias Low
Randomization bias N/A
Protocol deviation bias Low
COI / Funding bias Low

Rationale

Dimension Rationale
Reliability Peer-reviewed at ICML 2024, grounded in established economic theory (prospect theory)
Relevance Proposes a fundamentally different approach using binary signals instead of preferences
COI / Funding Academic authors (Stanford, Cohere); no single commercial interest

Evidence Extracts

Evidence Summary
SRC07-E01 KTO uses binary desirability signals instead of comparative preferences, matching RLHF performance