E01¶


Research	R0040 — RLHF Alternatives
Run	2026-04-01
Query	Q001
Source	SRC04
Evidence	SRC04-E01
Type	Factual

KTO applies prospect theory to LLM alignment with binary feedback

URL: https://arxiv.org/abs/2402.01306

Extract¶

KTO directly maximizes the utility of generations using Kahneman-Tversky prospect theory's human utility model, rather than maximizing preference log-likelihoods like DPO. Key innovation: it learns from only a binary signal of whether an output is desirable -- no preference comparisons required.

Results: - Matches or exceeds DPO performance at scales from 1B to 30B parameters - Requires only binary desirability labels (thumbs up/down), not ranked preference pairs - Dramatically reduces annotation overhead - Introduces the concept of Human-Aware Losses (HALOs): a family of loss functions that implicitly incorporate human cognitive biases

Relevance to Hypotheses¶

Open-ended query -- maps to thematic clusters:

Cluster	Relationship	Strength
Reward-free preference optimization	Supports	Eliminates both reward model and preference pair requirement
Data efficiency	Supports	Binary labels are far cheaper to collect than preference pairs
Theoretical grounding	Supports	Prospect theory provides principled foundation for loss design

Context¶

KTO is notable for its theoretical novelty -- it is the only major RLHF alternative grounded in behavioral economics. The connection to prospect theory suggests that human cognitive biases should be explicitly modeled in alignment, not treated as noise.