R0024/2026-03-25/Q004/H3¶
Statement¶
Some before/after metrics exist, but commitments are limited, inconsistent, and lack standardization. No company has made binding commitments to ongoing measurable sycophancy reduction targets with regular reporting cadences and independent verification.
Status¶
Current: Supported
This hypothesis best describes the evidence. Anthropic has published the strongest metrics (70-85% reduction, open-sourced evaluation tool), but even Anthropic has not committed to regular reporting cadences. OpenAI's methodology is opaque and explicitly warns that "future measurements may not be directly comparable to past ones." Google claims improvements but without detailed methodology. No company has submitted to independent third-party audits of sycophancy metrics.
Supporting Evidence¶
| Evidence | Summary |
|---|---|
| SRC01-E01 | Anthropic's metrics exist but lack binding commitment to regular reporting |
| SRC02-E01 | OpenAI warns its own metrics may not be comparable over time |
| SRC03-E01 | Anthropic places responsibility on users; industry response is inconsistent |
| SRC04-E01 | 42 AGs demanded commitments companies had not yet made, with January 2026 deadline |
Contradicting Evidence¶
| Evidence | Summary |
|---|---|
| SRC01-E01 | Anthropic's open-sourcing of Petri could be interpreted as a commitment mechanism rather than a limited effort |
Reasoning¶
The evidence base presents a clear picture: some companies have published some metrics (particularly Anthropic), but the industry lacks standardized measurement, binding commitments, regular reporting cadences, and independent verification. Georgetown Law criticized the approach as "intermittent blog posts that offer single snapshots based on self-selected metrics." The 42-state AG coalition demanding commitments by January 2026 suggests that companies had not voluntarily made such commitments.
Relationship to Other Hypotheses¶
H3 is supported as the best description. H1 is partially supported (metrics exist) but overstates commitment strength. H2 is eliminated.