Q002 — Assessment¶


Research	R0020 — Prompt Engineering Gaps
Run	2026-03-25
Query	Q002

BLUF¶

Mainstream prompt engineering guides have begun addressing sycophancy, particularly since the OpenAI GPT-4o incident (April 2025), but coverage remains inconsistent and shallow compared to academic research. Academic papers have demonstrated specific prompt-level techniques (question reframing yielding a 24pp reduction) that significantly outperform naive approaches, but these findings have not been systematically incorporated into vendor documentation.

Probability¶

Rating: Likely (55-80%) that a practitioner would encounter some mention of sycophancy in mainstream guides; unlikely (20-45%) that they would find comprehensive, actionable techniques

Confidence in assessment: Medium-High

Confidence rationale: The evidence base includes two peer-reviewed academic papers (high reliability) alongside industry and practitioner sources. The academic evidence is strong; the assessment of mainstream guide coverage is based on direct examination of vendor documentation.

Reasoning Chain¶

Academic research has identified four root causes of sycophancy and multiple prompt-level mitigation techniques [SRC01-E01, High reliability, High relevance]
The most effective prompt-level technique is question reframing (24pp reduction), which outperforms explicit anti-sycophancy instructions [SRC02-E01, High reliability, High relevance]
Mainstream practitioner guidance (NNG) discusses sycophancy but recommends behavioral mitigations (user-side), not prompt engineering techniques [SRC03-E01, High reliability, Medium-High relevance]
Anthropic's prompting guide discusses explicit instructions and constraint language — techniques that could reduce sycophancy — but does not frame them as anti-sycophancy measures
Industry sources claim quantitative improvements but lack verifiable evidence [SRC04-E01, Medium-Low reliability]
Critical gaps remain in sycophancy measurement standardization and technique scalability [SRC01-E02]

Evidence Base Summary¶

Source	Description	Reliability	Relevance	Key Finding
SRC01	Sycophancy survey paper	High	High	Four causes, prompt-level techniques, five critical gaps
SRC02	Question reframing study	High	High	24pp sycophancy reduction via question reframing
SRC03	NNG practitioner guidance	High	Medium-High	Behavioral (user-side) mitigations, not prompt techniques
SRC04	Industry strategies	Medium-Low	Medium	Prompt-level contributes ~29% vs training-level ~40-67%

Collection Synthesis¶

Dimension	Assessment
Evidence quality	Medium-High — two peer-reviewed papers anchor the collection
Source agreement	High — all sources agree sycophancy is a real problem with partial solutions
Source independence	High — academic papers, UX research firm, and industry blog are independent
Outliers	SRC04 is an outlier in reliability (unverifiable claims) but directionally consistent

Detail¶

The evidence reveals a clear academic-to-practitioner pipeline gap. Academic research has produced specific, tested techniques with quantitative results. Mainstream guides have acknowledged sycophancy as a concern (especially post-April 2025) but typically offer either behavioral advice (NNG: "reset conversations") or general prompt principles (Anthropic: "be explicit") rather than the specific techniques developed in research. The most striking finding is that question reframing outperforms direct instruction — meaning the most commonly suggested approach ("tell the AI not to be sycophantic") is demonstrably less effective than structural alternatives.

Gaps¶

Missing Evidence	Impact on Assessment
OpenAI's prompt engineering guide (403 error)	Cannot confirm whether OpenAI explicitly addresses sycophancy in prompt guidance
Google's prompt engineering documentation	Missing a major vendor perspective
Longitudinal studies on sycophancy mitigation effectiveness	Cannot assess long-term stability of prompt-level techniques
User studies on sycophancy awareness	Unknown whether practitioners are aware of the problem

Researcher Bias Check¶

Declared biases: No researcher profile provided for this run.

Influence assessment: The query framing assumes sycophancy is a problem worth addressing, which could bias toward finding techniques rather than questioning whether prompt-level mitigation is the right approach. The evidence base was examined for perspectives questioning the importance of prompt-level sycophancy mitigation.

Cross-References¶

Entity	ID	File
Hypotheses	H1, H2, H3	`hypotheses/`
Sources	SRC01, SRC02, SRC03, SRC04	`sources/`
ACH Matrix	—	`ach-matrix.md`
Self-Audit	—	`self-audit.md`