Skip to content

R0024/2026-03-25/Q004/SRC01/E01

Research R0024 — Sycophancy and Addiction
Run 2026-03-25
Query Q004
Source SRC01
Evidence SRC01-E01
Type Statistical

Anthropic sycophancy reduction metrics for Claude 4.5 model family

URL: https://www.anthropic.com/news/protecting-well-being-of-users

Extract

Anthropic's 4.5 model family showed substantial improvements:

  • Claude Opus 4.5, Sonnet 4.5, and Haiku 4.5 each scored 70-85% lower than Opus 4.1 on sycophancy and encouragement of user delusion metrics.
  • Stress-test course-correction rates: Opus 4.5 (10% failure), Sonnet 4.5 (16.5% failure), Haiku 4.5 (37% failure), vs. Opus 4.1 (64% failure).
  • Anthropic open-sourced Petri, their automated behavioral audit tool, freely available for comparing sycophancy scores across models.
  • The 4.5 model family outperformed all other frontier models tested in November 2025 on Petri's sycophancy evaluation.

The organization pledges to "continue to build new protections and safeguards" and remains "committed to publishing our methods and results transparently."

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Supports Specific before/after metrics published with an open evaluation tool
H2 Contradicts Quantitative metrics clearly exist
H3 Supports Metrics exist but are self-reported, lack binding commitment to regular reporting, and no independent audit has verified them

Context

The open-sourcing of Petri is significant — it enables independent verification of sycophancy claims across models. However, Anthropic's self-reported 70-85% improvement has not yet been independently verified using this tool.

Notes

The 70-85% figure compares two model generations (4.1 vs 4.5). This is a relative improvement, not an absolute benchmark. It does not tell us whether the absolute level of sycophancy in 4.5 models is "acceptable" by any external standard.