Skip to content

R0055/2026-04-01/C006

Research R0055 — RLHF Yes-Men Claims
Run 2026-04-01
Claim C006

Claim: Synthetic non-sycophantic training data produces the same sycophancy reduction as curated anti-sycophancy preference pairs

BLUF: Materially incorrect. Wei et al. (2024) showed synthetic data reduces sycophancy, but achieved much smaller reductions (4.7-10% depending on model size) compared to the 84-85% from curated preference pairs. The two approaches are complementary, not equivalent.

Probability: Very unlikely (05-20%) | Confidence: Medium


Summary

Entity Description
Claim Definition Claim text, scope, status
Assessment Full analytical product with reasoning chain
ACH Matrix Evidence x hypotheses diagnosticity analysis
Self-Audit ROBIS-adapted 5-domain audit

Hypotheses

ID Hypothesis Status
H1 Claim is accurate as stated Eliminated
H2 Claim is partially correct or correct with caveats Inconclusive
H3 Claim is materially wrong Supported

Searches

ID Target Results Selected
S01 synthetic data reduces sycophancy same reduction c 10 2

Sources

Source Description Reliability Relevance
SRC01 Wei et al. 2024 High High

Revisit Triggers

  • New synthetic data approaches achieving comparable reduction to curated pairs