Skip to content

R0023/2026-03-25/Q001/H1

Statement

Multiple popular prompt engineering techniques are empirically counterproductive — controlled studies demonstrate that several widely recommended techniques (persona prompting, emotional prompting, verbose prompts, few-shot with advanced models) actively reduce performance compared to simpler alternatives.

Status

Current: Partially supported

The evidence strongly supports that persona/role prompting and chain-of-thought prompting can be counterproductive in specific contexts. However, the evidence does not support a blanket claim that these techniques are "always" counterproductive — rather, they are counterproductive under identifiable conditions (factual accuracy tasks, reasoning models, specific model families). This makes H1 partially true but overstated in its generality.

Supporting Evidence

Evidence Summary
SRC03-E01 9 statistically significant negative effects of expert personas on MMLU-Pro across 5 of 6 models
SRC04-E01 Expert persona underperforms base model (68.0% vs. 71.6%) across 2,410 questions
SRC02-E01 CoT decreases accuracy in reasoning models (Gemini Flash 2.5: -13.1% at 100% threshold)
SRC02-E02 CoT introduces errors on questions models would otherwise answer correctly

Contradicting Evidence

Evidence Summary
SRC02-E01 CoT does help some non-reasoning models (Gemini Flash 2.0: +13.5%, Sonnet 3.5: +11.7%)
SRC03-E01 One model (Gemini 2.0 Flash) showed positive effects from expert personas

Reasoning

The evidence base shows a clear pattern: popular techniques can be counterproductive, but the counterproductive effects are tied to specific contexts (model type, task type, measurement threshold). This makes H1 partially supported — the core claim about counterproductive effects is validated, but the universality implied by H1 is not.

Relationship to Other Hypotheses

H1 and H3 share significant overlap. The evidence that supports H1 also supports H3, because the counterproductive effects are always context-dependent. H1 is best understood as the "strong version" of H3 — it is true that multiple techniques are counterproductive, but it is more accurate to say they are conditionally counterproductive (H3).