R0041/2026-03-28/Q001/SRC01/E01¶
Anthropic reports 70-85% sycophancy reduction in Claude 4.5 family using multi-turn behavioral audits and reinforcement learning training.
URL: https://www.anthropic.com/news/protecting-well-being-of-users
Extract¶
Anthropic began evaluating Claude for sycophancy in 2022 and has steadily refined training, testing, and reduction methods. Claude 4.5 Opus scored 70-85% lower on sycophancy measures compared to Opus 4.1. Evaluation uses multi-turn behavioral audits where one Claude model acts as "auditor" engaging another across dozens of exchanges, with a separate "judge" model grading performance. Human spot-checks verify accuracy. Claude 4.5 shows "dramatically fewer instances of encouragement of user delusion, a kind of extreme form of sycophancy." Reinforcement learning training rewards appropriate responses to sensitive topics. System prompts include guidance: "Don't be a sycophant!" The protections appear integrated across Claude.ai universally, with no enterprise-exclusive features mentioned.
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Supports | Demonstrates significant vendor investment in sycophancy reduction with quantified results, though not as a distinct enterprise feature |
| H2 | Contradicts | Conclusively shows Anthropic is actively targeting sycophancy reduction |
| H3 | Supports | Sycophancy reduction is achieved through model training (RL), not enterprise configuration; "no enterprise-exclusive features mentioned" |
Context¶
The 70-85% figure is self-reported by Anthropic and measured against their own earlier models. Independent verification of this specific claim was not found. The evaluation methodology (multi-turn audits) is publicly documented through the Petri tool.
Notes¶
The phrase "Don't be a sycophant!" appearing in system prompts suggests sycophancy reduction is partly achieved through prompt engineering at the system level, not solely through model architecture changes.