Skip to content

R0041/2026-04-01/Q001/H2

Research R0041 — Enterprise Sycophancy
Run 2026-04-01
Query Q001
Hypothesis H2

Statement

Vendors are actively researching and making incremental progress on sycophancy reduction at the model training level, and have developed evaluation tools, but have not yet productized sycophancy controls as enterprise-differentiated features.

Status

Current: Supported

Supporting Evidence

Evidence Summary
SRC01-E01 OpenAI's detailed postmortem and pledged fixes demonstrate active attention to sycophancy, but fixes are general model improvements
SRC02-E01 Anthropic reports 70-85% sycophancy improvement across model generations
SRC04-E01 Anthropic developed Bloom, an automated sycophancy evaluation tool tested across 16 frontier models
SRC05-E01 Anthropic's 2026 constitution update shifts from rules to reasoning, addressing sycophancy philosophically
SRC06-E01 Google's Gemini 3 explicitly lists reduced sycophancy as a feature, and Gemini 1.5 benchmarked as least sycophantic
SRC07-E01 Multiple independent sycophancy benchmarks now exist (syco-bench, SYCON-bench, SycEval)
SRC03-E01 Lambert identifies sycophancy as fundamentally an "art of the model" problem, suggesting productization is premature

Contradicting Evidence

Evidence Summary
SRC01-E02 OpenAI's sycophancy incident shows that even with active programs, regression is possible

Reasoning

The weight of evidence strongly supports this hypothesis. All three major vendors (OpenAI, Anthropic, Google) have active sycophancy research programs, have made measurable progress, and have developed evaluation tools. However, none has translated this into an enterprise-differentiated product. The pattern is: general model improvements benefit all users, but enterprises cannot specifically configure or contract for non-sycophantic behavior.

Relationship to Other Hypotheses

H2 occupies the middle ground between H1 (full productization, eliminated) and H3 (no meaningful action, weakened by evidence of genuine progress). The evidence most strongly supports this nuanced position.