Skip to content

R0041/2026-03-28/Q001/SRC01/E01

Research R0041 — Enterprise Sycophancy
Run 2026-03-28
Query Q001
Source SRC01
Evidence SRC01-E01
Type Reported

Anthropic reports 70-85% sycophancy reduction in Claude 4.5 family using multi-turn behavioral audits and reinforcement learning training.

URL: https://www.anthropic.com/news/protecting-well-being-of-users

Extract

Anthropic began evaluating Claude for sycophancy in 2022 and has steadily refined training, testing, and reduction methods. Claude 4.5 Opus scored 70-85% lower on sycophancy measures compared to Opus 4.1. Evaluation uses multi-turn behavioral audits where one Claude model acts as "auditor" engaging another across dozens of exchanges, with a separate "judge" model grading performance. Human spot-checks verify accuracy. Claude 4.5 shows "dramatically fewer instances of encouragement of user delusion, a kind of extreme form of sycophancy." Reinforcement learning training rewards appropriate responses to sensitive topics. System prompts include guidance: "Don't be a sycophant!" The protections appear integrated across Claude.ai universally, with no enterprise-exclusive features mentioned.

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Supports Demonstrates significant vendor investment in sycophancy reduction with quantified results, though not as a distinct enterprise feature
H2 Contradicts Conclusively shows Anthropic is actively targeting sycophancy reduction
H3 Supports Sycophancy reduction is achieved through model training (RL), not enterprise configuration; "no enterprise-exclusive features mentioned"

Context

The 70-85% figure is self-reported by Anthropic and measured against their own earlier models. Independent verification of this specific claim was not found. The evaluation methodology (multi-turn audits) is publicly documented through the Petri tool.

Notes

The phrase "Don't be a sycophant!" appearing in system prompts suggests sycophancy reduction is partly achieved through prompt engineering at the system level, not solely through model architecture changes.