Skip to content

R0044/2026-04-01/Q003/SRC03/E01

Research R0044 — Expanded Vocabulary Research
Run 2026-04-01
Query Q003
Source SRC03
Evidence SRC03-E01
Type Analytical

Malmqvist's sycophancy survey exemplifies the vocabulary silo pattern

URL: https://arxiv.org/html/2411.15287v1

Extract

Malmqvist's technical survey identifies four causes of LLM sycophancy: 1. Training data biases (flattery and agreeableness in online text) 2. RLHF limitations (reward hacking) 3. Lack of grounded knowledge (cannot fact-check own outputs) 4. Alignment challenges (helpfulness vs. factual accuracy)

Recommended system-side mitigations include activation steering, contrastive decoding, external knowledge integration, and dynamic prompting.

Critical finding for Q003: The paper does NOT reference automation bias, human factors research, aviation safety literature, healthcare decision support research, or any regulated-industry framework. Sycophancy is treated as a purely technical problem of model training and decoding, with no recognition that the downstream effect (human over-reliance on agreeable output) is a well-studied phenomenon in human factors under different terminology.

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 N/A This source does not attempt bridging
H2 Supports indirectly Demonstrates the silo pattern that partial bridging (Ibrahim et al.) is trying to overcome
H3 Supports Exemplifies the complete vocabulary separation in some research

Context

Malmqvist's survey is representative of the AI safety community's typical treatment of sycophancy — as a technical training/decoding problem rather than a human-machine interaction problem with decades of prior research under different names. This is the exact vocabulary silo that Q003 is investigating.