Skip to content

R0056/2026-04-01/C004

Claim: Curating anti-sycophancy preference pairs — training data where the correct answer disagrees with the user — reduces sycophancy by 84-85%, without changing the algorithm.

BLUF: Not verified. The specific 84-85% reduction figure could not be found in any referenced paper. The 84% figure in Anthropic's paper refers to model knowledge of misconceptions, not sycophancy reduction.

Probability: Unlikely (20-45%) | Confidence: Medium


Summary

Entity Description
Claim Definition Claim text, scope, status
Assessment Full analytical product with reasoning chain
ACH Matrix Evidence x hypotheses diagnosticity analysis
Self-Audit ROBIS-adapted 5-domain audit

Hypotheses

ID Hypothesis Status
H1 Claim is accurate Inconclusive
H2 Partially correct Inconclusive
H3 Materially wrong — figure not found Supported

Searches

ID Target Results Selected
S01 Evidence for claim 10 2

Sources

Source Description Reliability Relevance
SRC01 Anthropic sycophancy research High Medium

Revisit Triggers

  • New evidence or corrections to cited sources
  • Replication or refutation of key findings