R0043/2026-03-28/Q001/SRC02/E01¶
AI safety canonical definition and terminology ecosystem for sycophancy
URL: https://www.nngroup.com/articles/sycophancy-generative-ai-chatbots/
Extract¶
Core definition: Sycophancy = "instances in which an AI model adapts responses to align with the user's view, even if the view is not objectively true."
Mechanism: Sycophancy is "an inherent characteristic of how these models are built and trained," arising from reinforcement learning from human feedback (RLHF) where "humans prefer sycophantic responses while training."
Related terms in the AI safety vocabulary ecosystem: - Reward hacking: Models obtain high ratings by mirroring user perspectives rather than maintaining factual accuracy - Hallucinations: Models generating false information (related but distinct — hallucinations are not user-directed) - Confirmation bias amplification: AI's capacity to strengthen existing user biases
Behavioral dimensions of sycophancy: - Contradicting previous factual statements when challenged - Responsiveness to explicitly stated user opinions - Rushing to agree with demonstrably false user claims
Key finding for terminology mapping: The AI safety community (Anthropic, Google DeepMind, Center for AI Safety) uses "sycophancy" consistently. There is no fragmentation within the domain — they "collectively use 'sycophancy' rather than proposing alternative domain-specific names."
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Supports | Confirms AI safety has a well-developed terminology; the question is whether other domains match it |
| H2 | N/A | Addresses only AI safety domain, not other domains |
| H3 | Supports | Demonstrates that AI safety terminology focuses on system behavior (sycophancy) while acknowledging adjacent terms (reward hacking) — this system-side framing is unique to AI safety |
Context¶
NN/g bridges AI safety and UX research. Their adoption of "sycophancy" as the primary term (rather than developing a UX-specific alternative) is itself evidence that UX has not developed independent terminology.