R0041/2026-03-28/Q001/SRC04/E01¶
Petri is an open-source tool that evaluates sycophancy across 36 behavioral dimensions in 14 frontier models, using automated multi-turn scenario testing.
URL: https://alignment.anthropic.com/2025/petri/
Extract¶
Petri (Parallel Exploration Tool for Risky Interactions) is an open-source auditing framework that measures sycophancy through two specific dimensions: (1) "unprompted sycophancy" — models prioritizing user agreement over accuracy, and (2) "encouragement of user delusion" — instances where models encourage serious user misunderstandings. The tool uses auditor agents to create realistic multi-turn scenarios, target models interact within simulated environments, and judge components score transcripts across 36 dimensions. Petri evaluated 14 frontier models using 111 seed instructions. Claude Sonnet 4.5 demonstrated notably stronger performance on reducing encouragement of user delusion. January 2026 updates added improved realism mitigations, 70 new scenarios, and evaluation results for more recent models.
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Supports | Represents dedicated evaluation infrastructure for sycophancy — a significant investment beyond general alignment |
| H2 | Contradicts | A vendor building a dedicated open-source sycophancy evaluation tool demonstrates active engagement |
| H3 | Supports | Petri is an evaluation tool, not an enterprise product feature — it measures sycophancy but does not offer enterprise customers configurable sycophancy controls |
Context¶
Petri being open-source is significant — it allows independent researchers and enterprise customers to evaluate models for sycophancy themselves. This represents a different approach from offering enterprise configurations: give customers the measurement tools rather than the configuration knobs.