E01¶


Research	R0024 — Sycophancy and Addiction
Run	2026-03-25
Query	Q004
Source	SRC01
Evidence	SRC01-E01
Type	Statistical

Anthropic sycophancy reduction metrics for Claude 4.5 model family

URL: https://www.anthropic.com/news/protecting-well-being-of-users

Extract¶

Anthropic's 4.5 model family showed substantial improvements:

Claude Opus 4.5, Sonnet 4.5, and Haiku 4.5 each scored 70-85% lower than Opus 4.1 on sycophancy and encouragement of user delusion metrics.
Stress-test course-correction rates: Opus 4.5 (10% failure), Sonnet 4.5 (16.5% failure), Haiku 4.5 (37% failure), vs. Opus 4.1 (64% failure).
Anthropic open-sourced Petri, their automated behavioral audit tool, freely available for comparing sycophancy scores across models.
The 4.5 model family outperformed all other frontier models tested in November 2025 on Petri's sycophancy evaluation.

The organization pledges to "continue to build new protections and safeguards" and remains "committed to publishing our methods and results transparently."

Relevance to Hypotheses¶

Hypothesis	Relationship	Strength
H1	Supports	Specific before/after metrics published with an open evaluation tool
H2	Contradicts	Quantitative metrics clearly exist
H3	Supports	Metrics exist but are self-reported, lack binding commitment to regular reporting, and no independent audit has verified them

Context¶

The open-sourcing of Petri is significant — it enables independent verification of sycophancy claims across models. However, Anthropic's self-reported 70-85% improvement has not yet been independently verified using this tool.

Notes¶

The 70-85% figure compares two model generations (4.1 vs 4.5). This is a relative improvement, not an absolute benchmark. It does not tell us whether the absolute level of sycophancy in 4.5 models is "acceptable" by any external standard.