Skip to content

R0041/2026-04-01/Q001/SRC07/E01

Research R0041 — Enterprise Sycophancy
Run 2026-04-01
Query Q001
Source SRC07
Evidence SRC07-E01
Type Factual

The emergence of multiple independent sycophancy benchmarks

URL: https://www.syco-bench.com/

Extract

Multiple independent sycophancy benchmarks now exist:

  1. syco-bench (Tim Duffy): Four-part benchmark measuring Picking Sides, Mirroring, Attribution Bias, and Delusion Acceptance. The author notes that "relationships between the different tests are generally weak, suggesting either that each test captures a relatively independent aspect of sycophancy, or that some tests may not be well-aligned with our concept of sycophancy."

  2. SYCON-Bench (EMNLP 2025): Measures sycophancy in multi-turn, free-form conversational settings. Tracks "Turn of Flip" (how quickly a model conforms) and "Number of Flip" (how frequently it shifts stance under pressure).

  3. ELEPHANT (Stanford/CMU, 2025): Focuses on social sycophancy in LLMs, published in Science.

  4. SycEval (2025): Another evaluation framework.

  5. Bloom (Anthropic): Vendor-developed automated evaluation tool testing across 16 frontier models.

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Contradicts These are research tools, not enterprise products
H2 Supports The proliferation of benchmarks shows the field is maturing toward measurability
H3 Contradicts Independent measurement tools enable accountability

Context

The weak correlation between different sycophancy tests suggests sycophancy is a multi-dimensional phenomenon, not a single trait. This has implications for enterprise productization -- there may not be a single dial to turn.