R0041/2026-03-28/Q001/H1¶


Research	R0041 — Enterprise Sycophancy
Run	2026-03-28
Query	Q001
Hypothesis	H1

Statement¶

Yes, AI vendors are actively developing and/or offering enterprise-tier products specifically targeting sycophancy reduction as a distinct feature, including dedicated API parameters, enterprise configurations, or product tiers designed for professional and engineering use cases.

Status¶

Current: Partially supported

Evidence shows that vendors are investing significantly in sycophancy reduction, but not as a distinct enterprise product feature. Anthropic's soul document explicitly rejects sycophancy and frames this as important for enterprise users, and their Petri evaluation tool measures sycophancy specifically. OpenAI developed post-training techniques for GPT-5 targeting sycophancy. However, no vendor offers a configurable "sycophancy reduction" parameter or a distinct product tier marketed on this basis.

Supporting Evidence¶

Evidence	Summary
SRC01-E01	Anthropic's 70-85% sycophancy reduction in Claude 4.5 family with specific evaluation methodology
SRC04-E01	Petri evaluation tool specifically measures sycophancy across frontier models
SRC06-E01	Soul document explicitly rejects sycophancy, frames honesty as enterprise requirement

Contradicting Evidence¶

Evidence	Summary
SRC02-E01	OpenAI's sycophancy incident shows these improvements are baked into model training, not configurable enterprise features
SRC05-E01	OpenAI Model Spec addresses sycophancy as a model-level behavior, not an enterprise configuration

Reasoning¶

The evidence partially supports H1 in that vendors are clearly investing in sycophancy reduction as a priority. However, the investment takes the form of model-level improvements (training, evaluation, constitutional guidelines) rather than enterprise-configurable features. No vendor offers an API parameter like "sycophancy_level=low" or a distinct "enterprise accuracy" tier. This hypothesis is partially supported because the investment is real, but the delivery mechanism differs from what the hypothesis predicts.

Relationship to Other Hypotheses¶

H1 and H3 share significant overlap. The distinction is whether sycophancy reduction constitutes a "product feature" (H1) or a "general improvement" (H3). The evidence suggests H3 is the more accurate characterization, though Anthropic's dedicated evaluation infrastructure (Petri) and explicit constitutional language push toward H1 territory.