Skip to content

R0020/2026-03-25/Q002/H3

Research R0020 — Prompt Engineering Gaps
Run 2026-03-25
Query Q002
Hypothesis H3

Statement

Some guides address sycophancy (particularly recent ones), but coverage is inconsistent, often indirect, and typically lacks the depth found in academic research. Most mainstream prompt engineering documentation does not treat sycophancy as a first-class concern.

Status

Current: Supported

This hypothesis best fits the evidence. Academic research has developed specific, tested techniques (question reframing with 24pp improvement, persona assignment with up to 63.8% improvement), but these have not been systematically incorporated into mainstream vendor documentation. Vendor guides discuss related concepts (explicit instructions, honesty) without specifically framing them as anti-sycophancy measures.

Supporting Evidence

Evidence Summary
SRC01-E01 Academic survey identifies prompt-level techniques but notes limited standardization
SRC01-E02 Measurement inconsistency and scalability gaps in mitigation approaches
SRC02-E01 Specific technique (question reframing) tested but confined to academic literature

Contradicting Evidence

Evidence Summary
SRC03-E01 NNG coverage suggests mainstream awareness is growing
SRC04-E01 Industry blog with specific quantitative claims suggests practitioner engagement

Reasoning

The gap between academic research and mainstream guides is the most diagnostic finding. Academics have tested specific prompt-level techniques with controlled experiments and quantitative results. Mainstream guides either: (a) don't mention sycophancy at all, (b) mention it in passing, or (c) discuss related concepts (clarity, explicitness, honesty) without connecting them to sycophancy mitigation. Anthropic's prompting guide, for example, provides excellent guidance on explicit instructions and constraint language that would likely reduce sycophancy, but does not frame it as such.

Relationship to Other Hypotheses

H3 bridges the valid elements of H1 (techniques exist) and H2 (mainstream coverage is thin). The nuance is in the gap between what is known academically and what is accessible in practitioner documentation.