C004 — Claim Definition¶


Research	R0056 — RLHF Yes-Men Claims v2
Run	2026-04-01
Claim	C004

Claim as Received¶

Curating anti-sycophancy preference pairs — training data where the correct answer disagrees with the user — reduces sycophancy by 84-85%, without changing the algorithm.

Claim as Clarified¶

Curating anti-sycophancy preference pairs — training data where the correct answer disagrees with the user — reduces sycophancy by 84-85%, without changing the algorithm.

BLUF¶

Not verified. The specific 84-85% reduction figure could not be found in any referenced paper.

Scope¶

Domain: AI safety / sycophancy research
Timeframe: Current (as of April 2026)
Testability: Verifiable against published research and public sources

Assessment Summary¶

Probability: Unlikely (20-45%)

Confidence: Medium

Hypothesis outcome: H3 prevailed.

[Full assessment in assessment.md.]

Status¶

Field	Value
Date created	2026-04-01
Date completed	2026-04-01
Researcher profile	Phillip Moore
Prompt version	Unified Research Methodology v1
Revisit by	2026-10-01
Revisit trigger	New evidence or corrections