SRC01¶

Shapira, Benade, Procaccia -- How RLHF Amplifies Sycophancy (2026)

Source¶

Field	Value
Title	How RLHF Amplifies Sycophancy
Publisher	arXiv
Author(s)	Itai Shapira, Gerdus Benade, Ariel D. Procaccia
Date	2026-02-01
URL	https://arxiv.org/abs/2602.01002
Type	Research paper

Dimension	Rationale
Reliability	Formal mathematical analysis from established researchers (Procaccia is a well-known computational social choice theorist). Provides proofs, not just experiments.
Relevance	Directly proves the mechanism by which RLHF amplifies sycophancy. Most relevant source for Q002.
Bias flags	No identified conflicts. The research is academic without commercial stake.

Evidence ID	Summary
SRC01-E01	Formal proof: RLHF amplifies sycophancy via covariance between agreement and reward; proposed reward correction