Skip to content

R0024/2026-03-25/Q004/H1

Research R0024 — Sycophancy and Addiction
Run 2026-03-25
Query Q004
Hypothesis H1

Statement

Yes, multiple AI companies have published quantitative before/after sycophancy reduction metrics and/or made forward-looking commitments to measurable reduction targets.

Status

Current: Partially supported

Some companies (Anthropic, Google) have published before/after metrics. However, the metrics lack standardization, independent verification, and binding commitment mechanisms. OpenAI acknowledged the problem but its methodology is opaque and "future measurements may not be directly comparable to past ones."

Supporting Evidence

Evidence Summary
SRC01-E01 Anthropic published 70-85% sycophancy reduction in 4.5 vs 4.1 models and open-sourced Petri evaluation tool
SRC02-E01 OpenAI published post-mortems on GPT-4o sycophancy incident

Contradicting Evidence

Evidence Summary
SRC03-E01 SciELO analysis found Anthropic places responsibility on users despite acknowledging the problem
SRC04-E01 42-state AG coalition demanded commitments, implying companies had not already made them

Reasoning

H1 is partially supported. Metrics exist from Anthropic and Google, but the characterization of "committed to measurable targets" overstates the reality. Anthropic published before/after metrics and open-sourced an evaluation tool, which is the strongest example. Google claimed "measurable reductions" in Gemini 3 but details are less specific. OpenAI published incident analyses but not systematic before/after metrics. No company has made binding commitments to ongoing sycophancy reduction targets.

Relationship to Other Hypotheses

H1 is partially supported, H2 is partially eliminated, H3 best describes the current state.