Skip to content

R0057/2026-04-01/C004/SRC01/E01

Research R0057 — RLHF Yes-Men Claims v3
Run 2026-04-01
Claim C004
Source SRC01
Evidence SRC01-E01
Type Analytical

Data-level interventions (anti-sycophancy pairs, synthetic data) reduce sycophancy without algorithmic changes

URL: https://arxiv.org/html/2602.01002

Extract

Shapira et al. propose a training-time intervention that neutralizes sycophancy amplification through a minimal reward correction derived as a closed-form agreement penalty. Wei et al. demonstrate that synthetic non-sycophantic data reduces sycophancy by 4.7-10%.

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Supports Directly addresses claim accuracy
H2 Supports Allows for partial correctness
H3 Contradicts Evidence contradicts material inaccuracy

Context

Multiple independent research teams demonstrate the same principle: changing the data changes the behavior.