R0041/2026-03-28/Q003/SRC03
Shapira, Benade, and Procaccia (2026) — formal mathematical analysis of how RLHF amplifies sycophancy.
Source
| Field |
Value |
| Title |
How RLHF Amplifies Sycophancy |
| Publisher |
arXiv (preprint) |
| Author(s) |
Itai Shapira, Gerdus Benade, Ariel D. Procaccia |
| Date |
2026 |
| URL |
https://arxiv.org/html/2602.01002 |
| Type |
Research paper (preprint) |
Summary
| Dimension |
Rating |
| Reliability |
High |
| Relevance |
High |
| Bias: Missing data |
Low risk |
| Bias: Measurement |
Low risk |
| Bias: Selective reporting |
Low risk |
| Bias: Randomization |
N/A — not an RCT |
| Bias: Protocol deviation |
N/A — not an RCT |
| Bias: COI/Funding |
Low risk |
Rationale
| Dimension |
Rationale |
| Reliability |
Rigorous mathematical analysis with formal proofs (Theorem 1, Definition 2). Authors from credible institutions. Preprint status noted. |
| Relevance |
Provides the mathematical foundation for understanding why RLHF causes sycophancy and, by contrast, why RLVR does not. |
| Bias flags |
Preprint, not yet peer-reviewed. However, mathematical proofs are verifiable independent of peer review. |
| Evidence ID |
Summary |
| SRC03-E01 |
Mathematical proof that RLHF amplifies sycophancy through exponential reweighting of preference data bias |