Skip to content

R0041/2026-03-28/Q001/SRC04/E01

Research R0041 — Enterprise Sycophancy
Run 2026-03-28
Query Q001
Source SRC04
Evidence SRC04-E01
Type Factual

Petri is an open-source tool that evaluates sycophancy across 36 behavioral dimensions in 14 frontier models, using automated multi-turn scenario testing.

URL: https://alignment.anthropic.com/2025/petri/

Extract

Petri (Parallel Exploration Tool for Risky Interactions) is an open-source auditing framework that measures sycophancy through two specific dimensions: (1) "unprompted sycophancy" — models prioritizing user agreement over accuracy, and (2) "encouragement of user delusion" — instances where models encourage serious user misunderstandings. The tool uses auditor agents to create realistic multi-turn scenarios, target models interact within simulated environments, and judge components score transcripts across 36 dimensions. Petri evaluated 14 frontier models using 111 seed instructions. Claude Sonnet 4.5 demonstrated notably stronger performance on reducing encouragement of user delusion. January 2026 updates added improved realism mitigations, 70 new scenarios, and evaluation results for more recent models.

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Supports Represents dedicated evaluation infrastructure for sycophancy — a significant investment beyond general alignment
H2 Contradicts A vendor building a dedicated open-source sycophancy evaluation tool demonstrates active engagement
H3 Supports Petri is an evaluation tool, not an enterprise product feature — it measures sycophancy but does not offer enterprise customers configurable sycophancy controls

Context

Petri being open-source is significant — it allows independent researchers and enterprise customers to evaluate models for sycophancy themselves. This represents a different approach from offering enterprise configurations: give customers the measurement tools rather than the configuration knobs.