R0044/2026-04-01/Q003/SRC03/E01¶
Malmqvist's sycophancy survey exemplifies the vocabulary silo pattern
URL: https://arxiv.org/html/2411.15287v1
Extract¶
Malmqvist's technical survey identifies four causes of LLM sycophancy: 1. Training data biases (flattery and agreeableness in online text) 2. RLHF limitations (reward hacking) 3. Lack of grounded knowledge (cannot fact-check own outputs) 4. Alignment challenges (helpfulness vs. factual accuracy)
Recommended system-side mitigations include activation steering, contrastive decoding, external knowledge integration, and dynamic prompting.
Critical finding for Q003: The paper does NOT reference automation bias, human factors research, aviation safety literature, healthcare decision support research, or any regulated-industry framework. Sycophancy is treated as a purely technical problem of model training and decoding, with no recognition that the downstream effect (human over-reliance on agreeable output) is a well-studied phenomenon in human factors under different terminology.
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | N/A | This source does not attempt bridging |
| H2 | Supports indirectly | Demonstrates the silo pattern that partial bridging (Ibrahim et al.) is trying to overcome |
| H3 | Supports | Exemplifies the complete vocabulary separation in some research |
Context¶
Malmqvist's survey is representative of the AI safety community's typical treatment of sycophancy — as a technical training/decoding problem rather than a human-machine interaction problem with decades of prior research under different names. This is the exact vocabulary silo that Q003 is investigating.