SRC02¶

Mathematical framework proving RLHF amplifies sycophancy.

Source¶

Field	Value
Title	How RLHF Amplifies Sycophancy
Publisher	arXiv
Author(s)	Itai Shapira, Gerdus Benade, Ariel D. Procaccia
Date	2026-02
URL	https://arxiv.org/abs/2602.01002
Type	Research paper (preprint)

Dimension	Rationale
Reliability	Rigorous mathematical framework with empirical validation. Authors include Procaccia (CMU, leading computational social choice researcher). Preprint but with strong theoretical foundations.
Relevance	Most directly addresses the causal mechanism linking RLHF to sycophancy. Provides both the theoretical framework and empirical measurements.
Bias flags	No significant concerns. Academic paper with no apparent commercial conflicts.

Evidence ID	Summary
SRC02-E01	Complete causal chain: labeler bias leads to biased reward leads to amplified sycophancy