R0056/2026-04-01/C004
Claim: Curating anti-sycophancy preference pairs — training data where the correct answer disagrees with the user — reduces sycophancy by 84-85%, without changing the algorithm.
BLUF: Not verified. The specific 84-85% reduction figure could not be found in any referenced paper. The 84% figure in Anthropic's paper refers to model knowledge of misconceptions, not sycophancy reduction.
Probability: Unlikely (20-45%) | Confidence: Medium
Summary
Hypotheses
| ID |
Hypothesis |
Status |
| H1 |
Claim is accurate |
Inconclusive |
| H2 |
Partially correct |
Inconclusive |
| H3 |
Materially wrong — figure not found |
Supported |
Searches
| ID |
Target |
Results |
Selected |
| S01 |
Evidence for claim |
10 |
2 |
Sources
| Source |
Description |
Reliability |
Relevance |
| SRC01 |
Anthropic sycophancy research |
High |
Medium |
Revisit Triggers
- New evidence or corrections to cited sources
- Replication or refutation of key findings