R0041/2026-04-01/Q001/H2¶


Research	R0041 — Enterprise Sycophancy
Run	2026-04-01
Query	Q001
Hypothesis	H2

Statement¶

Vendors are actively researching and making incremental progress on sycophancy reduction at the model training level, and have developed evaluation tools, but have not yet productized sycophancy controls as enterprise-differentiated features.

Status¶

Current: Supported

Supporting Evidence¶

Evidence	Summary
SRC01-E01	OpenAI's detailed postmortem and pledged fixes demonstrate active attention to sycophancy, but fixes are general model improvements
SRC02-E01	Anthropic reports 70-85% sycophancy improvement across model generations
SRC04-E01	Anthropic developed Bloom, an automated sycophancy evaluation tool tested across 16 frontier models
SRC05-E01	Anthropic's 2026 constitution update shifts from rules to reasoning, addressing sycophancy philosophically
SRC06-E01	Google's Gemini 3 explicitly lists reduced sycophancy as a feature, and Gemini 1.5 benchmarked as least sycophantic
SRC07-E01	Multiple independent sycophancy benchmarks now exist (syco-bench, SYCON-bench, SycEval)
SRC03-E01	Lambert identifies sycophancy as fundamentally an "art of the model" problem, suggesting productization is premature

Contradicting Evidence¶

Evidence	Summary
SRC01-E02	OpenAI's sycophancy incident shows that even with active programs, regression is possible

Reasoning¶

The weight of evidence strongly supports this hypothesis. All three major vendors (OpenAI, Anthropic, Google) have active sycophancy research programs, have made measurable progress, and have developed evaluation tools. However, none has translated this into an enterprise-differentiated product. The pattern is: general model improvements benefit all users, but enterprises cannot specifically configure or contract for non-sycophantic behavior.

Relationship to Other Hypotheses¶

H2 occupies the middle ground between H1 (full productization, eliminated) and H3 (no meaningful action, weakened by evidence of genuine progress). The evidence most strongly supports this nuanced position.