Skip to content

R0020/2026-03-25/Q002/SRC01/E01

Research R0020 — Prompt Engineering Gaps
Run 2026-03-25
Query Q002
Source SRC01
Evidence SRC01-E01
Type Analytical

Four root causes of sycophancy and prompt-level mitigation techniques

URL: https://arxiv.org/html/2411.15287v1

Extract

Four primary causes identified: 1. Training data biases — Models absorb patterns favoring agreeableness over accuracy 2. RLHF limitations — Reward structures inadvertently incentivize user agreement over truthfulness 3. Lack of grounded knowledge — Models cannot fact-check outputs or recognize logical inconsistencies 4. Alignment definition challenges — Difficulty balancing helpfulness versus factual accuracy

Prompt-level mitigation techniques: - Contrastive decoding (LQCD) — Suppresses token probabilities associated with sycophantic responses by contrasting neutral and leading query distributions - Dynamic prompting — Adjusts system instructions based on detected sycophancy patterns - Adversarial testing — Deliberately crafts prompts to reveal sycophantic vulnerabilities

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Supports Academic research documents specific techniques
H2 Contradicts Techniques exist in academic literature
H3 Supports Techniques are academic, not yet mainstream

Context

The techniques described are primarily research-grade implementations, not user-accessible prompt patterns. LQCD requires access to model internals (token probabilities), and dynamic prompting requires infrastructure beyond simple prompt writing. This supports H3 — the knowledge exists but is not accessible to typical prompt engineers.