Research	R0040 — RLHF Alternatives
Run	2026-03-29
Query	Q001 — RLHF Alternatives
Source	SRC07
Evidence	SRC07-E01

SRC07-E01 — KTO Uses Binary Signals, Not Preferences¶

Extract¶

KTO "directly maximizes the utility of generations instead of maximizing the log-likelihood of preferences, as current methods do." It requires only "a binary signal of whether output is desirable or not" rather than comparative preferences like "Output A trumps output B." KTO "matches or exceeds the performance of preference-based methods at scales from 1B to 30B" and its theoretical foundation is Kahneman and Tversky's prospect theory, specifically the concept of loss aversion.

Relevance to Hypotheses¶

Hypothesis	Relationship	Strength
H1	Strongly supports — KTO is a fundamentally different approach to alignment	Strong
H2	Contradicts — KTO demonstrates viable production alternative	Strong
H3	Supports — KTO targets the data collection problem (binary vs preference) specifically	Moderate

Context¶

KTO's use of binary signals rather than preferences is practically significant because thumbs-up/down data is far more abundant than comparative preference data.

Notes¶

The irony is that the OpenAI GPT-4o sycophancy incident involved over-optimization on thumbs-up/down signals — exactly the kind of data KTO uses. This suggests the signal type alone does not solve sycophancy.