R0041/2026-04-01/Q001/SRC07/E01¶
The emergence of multiple independent sycophancy benchmarks
URL: https://www.syco-bench.com/
Extract¶
Multiple independent sycophancy benchmarks now exist:
-
syco-bench (Tim Duffy): Four-part benchmark measuring Picking Sides, Mirroring, Attribution Bias, and Delusion Acceptance. The author notes that "relationships between the different tests are generally weak, suggesting either that each test captures a relatively independent aspect of sycophancy, or that some tests may not be well-aligned with our concept of sycophancy."
-
SYCON-Bench (EMNLP 2025): Measures sycophancy in multi-turn, free-form conversational settings. Tracks "Turn of Flip" (how quickly a model conforms) and "Number of Flip" (how frequently it shifts stance under pressure).
-
ELEPHANT (Stanford/CMU, 2025): Focuses on social sycophancy in LLMs, published in Science.
-
SycEval (2025): Another evaluation framework.
-
Bloom (Anthropic): Vendor-developed automated evaluation tool testing across 16 frontier models.
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Contradicts | These are research tools, not enterprise products |
| H2 | Supports | The proliferation of benchmarks shows the field is maturing toward measurability |
| H3 | Contradicts | Independent measurement tools enable accountability |
Context¶
The weak correlation between different sycophancy tests suggests sycophancy is a multi-dimensional phenomenon, not a single trait. This has implications for enterprise productization -- there may not be a single dial to turn.