R0024/2026-03-25/Q004/SRC01/E01¶
Anthropic sycophancy reduction metrics for Claude 4.5 model family
URL: https://www.anthropic.com/news/protecting-well-being-of-users
Extract¶
Anthropic's 4.5 model family showed substantial improvements:
- Claude Opus 4.5, Sonnet 4.5, and Haiku 4.5 each scored 70-85% lower than Opus 4.1 on sycophancy and encouragement of user delusion metrics.
- Stress-test course-correction rates: Opus 4.5 (10% failure), Sonnet 4.5 (16.5% failure), Haiku 4.5 (37% failure), vs. Opus 4.1 (64% failure).
- Anthropic open-sourced Petri, their automated behavioral audit tool, freely available for comparing sycophancy scores across models.
- The 4.5 model family outperformed all other frontier models tested in November 2025 on Petri's sycophancy evaluation.
The organization pledges to "continue to build new protections and safeguards" and remains "committed to publishing our methods and results transparently."
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Supports | Specific before/after metrics published with an open evaluation tool |
| H2 | Contradicts | Quantitative metrics clearly exist |
| H3 | Supports | Metrics exist but are self-reported, lack binding commitment to regular reporting, and no independent audit has verified them |
Context¶
The open-sourcing of Petri is significant — it enables independent verification of sycophancy claims across models. However, Anthropic's self-reported 70-85% improvement has not yet been independently verified using this tool.
Notes¶
The 70-85% figure compares two model generations (4.1 vs 4.5). This is a relative improvement, not an absolute benchmark. It does not tell us whether the absolute level of sycophancy in 4.5 models is "acceptable" by any external standard.