R0023/2026-03-25/Q001/H1¶


Research	R0023 — Counterproductive advice and prompt lifecycle
Run	2026-03-25
Query	Q001
Hypothesis	H1

Statement¶

Multiple popular prompt engineering techniques are empirically counterproductive — controlled studies demonstrate that several widely recommended techniques (persona prompting, emotional prompting, verbose prompts, few-shot with advanced models) actively reduce performance compared to simpler alternatives.

Status¶

Current: Partially supported

The evidence strongly supports that persona/role prompting and chain-of-thought prompting can be counterproductive in specific contexts. However, the evidence does not support a blanket claim that these techniques are "always" counterproductive — rather, they are counterproductive under identifiable conditions (factual accuracy tasks, reasoning models, specific model families). This makes H1 partially true but overstated in its generality.

Supporting Evidence¶

Evidence	Summary
SRC03-E01	9 statistically significant negative effects of expert personas on MMLU-Pro across 5 of 6 models
SRC04-E01	Expert persona underperforms base model (68.0% vs. 71.6%) across 2,410 questions
SRC02-E01	CoT decreases accuracy in reasoning models (Gemini Flash 2.5: -13.1% at 100% threshold)
SRC02-E02	CoT introduces errors on questions models would otherwise answer correctly

Contradicting Evidence¶

Evidence	Summary
SRC02-E01	CoT does help some non-reasoning models (Gemini Flash 2.0: +13.5%, Sonnet 3.5: +11.7%)
SRC03-E01	One model (Gemini 2.0 Flash) showed positive effects from expert personas

Reasoning¶

The evidence base shows a clear pattern: popular techniques can be counterproductive, but the counterproductive effects are tied to specific contexts (model type, task type, measurement threshold). This makes H1 partially supported — the core claim about counterproductive effects is validated, but the universality implied by H1 is not.

Relationship to Other Hypotheses¶

H1 and H3 share significant overlap. The evidence that supports H1 also supports H3, because the counterproductive effects are always context-dependent. H1 is best understood as the "strong version" of H3 — it is true that multiple techniques are counterproductive, but it is more accurate to say they are conditionally counterproductive (H3).