Skip to content

R0041/2026-03-28/Q003/SRC03

Research R0041 — Enterprise Sycophancy
Run 2026-03-28
Query Q003
Search S04
Result S04-R01
Source SRC03

Shapira, Benade, and Procaccia (2026) — formal mathematical analysis of how RLHF amplifies sycophancy.

Source

Field Value
Title How RLHF Amplifies Sycophancy
Publisher arXiv (preprint)
Author(s) Itai Shapira, Gerdus Benade, Ariel D. Procaccia
Date 2026
URL https://arxiv.org/html/2602.01002
Type Research paper (preprint)

Summary

Dimension Rating
Reliability High
Relevance High
Bias: Missing data Low risk
Bias: Measurement Low risk
Bias: Selective reporting Low risk
Bias: Randomization N/A — not an RCT
Bias: Protocol deviation N/A — not an RCT
Bias: COI/Funding Low risk

Rationale

Dimension Rationale
Reliability Rigorous mathematical analysis with formal proofs (Theorem 1, Definition 2). Authors from credible institutions. Preprint status noted.
Relevance Provides the mathematical foundation for understanding why RLHF causes sycophancy and, by contrast, why RLVR does not.
Bias flags Preprint, not yet peer-reviewed. However, mathematical proofs are verifiable independent of peer review.

Evidence Extracts

Evidence ID Summary
SRC03-E01 Mathematical proof that RLHF amplifies sycophancy through exponential reweighting of preference data bias