R0041/2026-04-01/Q001/SRC02/E01¶
Anthropic's sycophancy reduction claims for Claude Sonnet 4.5
URL: https://www.anthropic.com/news/claude-sonnet-4-5
Extract¶
Anthropic states that Claude's "extensive safety training" has achieved "reducing concerning behaviors like sycophancy, deception, power-seeking, and the tendency to encourage delusional thinking." The company claims 70-85% improvement in sycophancy reduction over previous model generations. The model is described as "our most aligned frontier model yet."
No enterprise-specific API parameters or configurations for controlling sycophancy are mentioned. No details on how sycophancy was measured. The improvements are presented as general model-wide enhancements available to all users, not enterprise-differentiated features.
Separately, Anthropic began evaluating Claude for sycophancy in 2022 and has "steadily refined how it trains, tests, and reduces sycophancy, with the most recent models being the least sycophantic to date."
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Contradicts | No enterprise-specific product or API parameters offered |
| H2 | Supports | Demonstrates active, long-running research and measurable improvement |
| H3 | Contradicts | The claimed improvements are substantial, even if self-reported |
Context¶
The 70-85% figure is a vendor self-report without published methodology. The researcher profile notes skepticism toward vendor safety claims, which is warranted here. However, the longitudinal commitment (since 2022) suggests genuine investment.