Skip to content

R0041/2026-04-01/Q001/SRC02/E01

Research R0041 — Enterprise Sycophancy
Run 2026-04-01
Query Q001
Source SRC02
Evidence SRC02-E01
Type Reported

Anthropic's sycophancy reduction claims for Claude Sonnet 4.5

URL: https://www.anthropic.com/news/claude-sonnet-4-5

Extract

Anthropic states that Claude's "extensive safety training" has achieved "reducing concerning behaviors like sycophancy, deception, power-seeking, and the tendency to encourage delusional thinking." The company claims 70-85% improvement in sycophancy reduction over previous model generations. The model is described as "our most aligned frontier model yet."

No enterprise-specific API parameters or configurations for controlling sycophancy are mentioned. No details on how sycophancy was measured. The improvements are presented as general model-wide enhancements available to all users, not enterprise-differentiated features.

Separately, Anthropic began evaluating Claude for sycophancy in 2022 and has "steadily refined how it trains, tests, and reduces sycophancy, with the most recent models being the least sycophantic to date."

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Contradicts No enterprise-specific product or API parameters offered
H2 Supports Demonstrates active, long-running research and measurable improvement
H3 Contradicts The claimed improvements are substantial, even if self-reported

Context

The 70-85% figure is a vendor self-report without published methodology. The researcher profile notes skepticism toward vendor safety claims, which is warranted here. However, the longitudinal commitment (since 2022) suggests genuine investment.