R0023/2026-03-25/Q001/H1¶
Statement¶
Multiple popular prompt engineering techniques are empirically counterproductive — controlled studies demonstrate that several widely recommended techniques (persona prompting, emotional prompting, verbose prompts, few-shot with advanced models) actively reduce performance compared to simpler alternatives.
Status¶
Current: Partially supported
The evidence strongly supports that persona/role prompting and chain-of-thought prompting can be counterproductive in specific contexts. However, the evidence does not support a blanket claim that these techniques are "always" counterproductive — rather, they are counterproductive under identifiable conditions (factual accuracy tasks, reasoning models, specific model families). This makes H1 partially true but overstated in its generality.
Supporting Evidence¶
| Evidence | Summary |
|---|---|
| SRC03-E01 | 9 statistically significant negative effects of expert personas on MMLU-Pro across 5 of 6 models |
| SRC04-E01 | Expert persona underperforms base model (68.0% vs. 71.6%) across 2,410 questions |
| SRC02-E01 | CoT decreases accuracy in reasoning models (Gemini Flash 2.5: -13.1% at 100% threshold) |
| SRC02-E02 | CoT introduces errors on questions models would otherwise answer correctly |
Contradicting Evidence¶
| Evidence | Summary |
|---|---|
| SRC02-E01 | CoT does help some non-reasoning models (Gemini Flash 2.0: +13.5%, Sonnet 3.5: +11.7%) |
| SRC03-E01 | One model (Gemini 2.0 Flash) showed positive effects from expert personas |
Reasoning¶
The evidence base shows a clear pattern: popular techniques can be counterproductive, but the counterproductive effects are tied to specific contexts (model type, task type, measurement threshold). This makes H1 partially supported — the core claim about counterproductive effects is validated, but the universality implied by H1 is not.
Relationship to Other Hypotheses¶
H1 and H3 share significant overlap. The evidence that supports H1 also supports H3, because the counterproductive effects are always context-dependent. H1 is best understood as the "strong version" of H3 — it is true that multiple techniques are counterproductive, but it is more accurate to say they are conditionally counterproductive (H3).