Skip to content

R0057/2026-04-01/C018 — Assessment

BLUF

Confirmed. Anthropic reports 70-85% sycophancy reductions in Claude 4.5 models compared to Opus 4.1, using the Petri evaluation set. OpenAI reports substantial sycophancy improvements in GPT-5 and released public evaluations. Both ship improvements to all users.

Probability

Rating: Very likely (80-95%)

Confidence in assessment: High

Confidence rationale: Direct from vendor announcements with quantified metrics.

Reasoning Chain

  1. Anthropic's latest models (Opus 4.5, Sonnet 4.5, Haiku 4.5) scored 70-85% lower on sycophancy than Opus 4.1. OpenAI reports GPT-5 shows substantial improvements in sycophancy reduction. Both companies released public evaluation metrics. Improvements ship to all users, not enterprise-specific. [SRC01-E01, High reliability, High relevance]

  2. JUDGMENT: Confirmed. Anthropic reports 70-85% sycophancy reductions in Claude 4.5 models compared to Opus 4.1, using the Petri evaluation set. OpenAI reports substantial sycophancy improvements in GPT-5 and released public evaluations. Both ship improvements to all users.

Evidence Base Summary

Source Description Reliability Relevance Key Finding
SRC01 Anthropic and OpenAI sycophancy reduction announcements High High Anthropic reports 70-85% sycophancy reduction in latest models; OpenAI reports substantial improvements in GPT-5

Collection Synthesis

Dimension Assessment
Evidence quality High
Source agreement High
Source independence Medium
Outliers None identified

Detail

The evidence supports the assessment. Direct from vendor announcements with quantified metrics.

Gaps

Missing Evidence Impact on Assessment
Additional independent verification Would strengthen confidence

Researcher Bias Check

Declared biases: Anti-sycophancy bias could influence interpretation toward confirming sycophancy claims.

Influence assessment: Mitigated by reliance on peer-reviewed and primary sources.

Cross-References

Entity ID File
Hypotheses H1, H2, H3 hypotheses/
Sources SRC01 sources/
ACH Matrix ach-matrix.md
Self-Audit self-audit.md