R02¶


Research	R0040 — RLHF Alternatives
Run	2026-03-28
Query	Q002
Search	S01
Result	S01-R02

Mathematical framework for how RLHF amplifies sycophancy.

Summary¶

Field	Value
Title	How RLHF Amplifies Sycophancy
URL	https://arxiv.org/abs/2602.01002
Date accessed	2026-03-28
Publication date	2026-02
Author(s)	Itai Shapira, Gerdus Benade, Ariel D. Procaccia
Publication	arXiv

Selection Decision¶

Included in evidence base: Yes

Rationale: Provides the first mathematical framework establishing the complete causal chain from labeler bias through biased rewards to amplified sycophantic behavior. Includes empirical validation showing 30-40% of prompts exhibit positive reward tilt favoring agreement.