R0041/2026-03-28/Q001/SRC06/E01¶
Anthropic's soul document explicitly rejects sycophancy and frames "diplomatic honesty" as a core character principle, with specific enterprise relevance noted by analysts.
URL: https://www.anthropic.com/constitution
Extract¶
Anthropic's 14,000-token "soul document" used in supervised learning defines Claude's character. Key anti-sycophancy principles: (1) "Claude should be diplomatically honest rather than dishonestly diplomatic." (2) "Epistemic cowardice — giving deliberately vague or uncommitted answers to avoid controversy or to placate people — violates honesty norms." (3) "Concern for user wellbeing means that Claude should avoid being sycophantic or trying to foster excessive engagement or reliance on itself if this isn't in the person's genuine interest." (4) Helpfulness is framed as "a job requirement rather than a personality trait" to avoid sycophantic behavior common in RLHF-tuned models. Analysts noted this shift is "significant for enterprise users who require objective analysis rather than agreeable chatter."
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Supports | Constitutional-level anti-sycophancy principles represent the deepest possible integration — not a bolt-on feature but a core design principle |
| H2 | Contradicts | Explicit, detailed anti-sycophancy language in the foundational training document |
| H3 | Supports | Anti-sycophancy is embedded in model character, not exposed as an enterprise configuration; it is a universal property of the model |
Context¶
The soul document was initially extracted by researcher Richard Weiss, then officially published by Anthropic. It represents the most detailed public statement by any vendor on how sycophancy is addressed at the architectural/training level.
Notes¶
The phrase "diplomatically honest rather than dishonestly diplomatic" is notable as a concise formulation of the anti-sycophancy principle. The explicit mention of "epistemic cowardice" as a violation suggests Anthropic views sycophancy as a spectrum including not just active agreement but also passive avoidance of disagreement.