C004 — Claim Definition¶


Research	R0057 — RLHF Yes-Men Claims v3
Run	2026-04-01
Claim	C004

Claim as Received¶

Curating anti-sycophancy preference pairs — training data where the correct answer disagrees with the user — dramatically reduces sycophancy without changing the algorithm at all.

Claim as Clarified¶

Curating anti-sycophancy preference pairs — training data where the correct answer disagrees with the user — dramatically reduces sycophancy without changing the algorithm at all.

BLUF¶

Confirmed. Multiple studies demonstrate that data-level interventions reduce sycophancy. Shapira et al. derive a closed-form agreement penalty as a minimal reward correction. Wei et al. show synthetic data reduces sycophancy 4.7-10%.

Scope¶

Domain: AI sycophancy research
Timeframe: Current (2024-2026)
Testability: Verifiable against published research and public records

Assessment Summary¶

Probability: Very likely (80-95%)

Confidence: High

Hypothesis outcome: H1 is supported based on available evidence.

[Full assessment in assessment.md.]

Status¶

Field	Value
Date created	2026-04-01
Date completed	2026-04-01
Researcher profile	Phillip Moore
Prompt version	Unified Research Methodology v1
Revisit by	2027-04-01
Revisit trigger	If data-level interventions are shown to be ineffective at scale