C013¶


Research	R0028 — Prompt Engineering Claims
Run	2026-03-26
Claim	C013

Claim: The same GAIL research found that chain-of-thought prompting hurts performance on reasoning models.

BLUF: Partially correct. GAIL's research (a separate report, not the persona report) found that CoT prompting provides minimal benefits for reasoning models (2.9-3.1% average improvement for o3-mini and o4-mini) with substantial time costs (20-80% increase), and Gemini Flash 2.5 actually showed performance decreases (-13.1% at 100% threshold). However, this was published as a separate technical report dated June 2025, not part of the persona study, and not presented at EMNLP 2024.

Probability: Likely (55-80%) | Confidence: Medium

Correction needed: This was a separate GAIL report (Prompting Science Report 2, dated June 2025) from the persona study, not 'the same GAIL research.' The finding is more nuanced: CoT provides negligible gains for reasoning models rather than consistently hurting performance.

Summary¶

Entity	Description
Claim Definition	Claim text, scope, status
Assessment	Full analytical product with reasoning chain
ACH Matrix	Evidence x hypotheses diagnosticity analysis
Self-Audit	ROBIS-adapted 4-domain process audit

Hypotheses¶

ID	Hypothesis	Status
H1	Claim is accurate — CoT hurts reasoning model performance	Inconclusive
H2	CoT shows minimal benefit for reasoning models and can hurt specific models, but the research is a separate report, not the 'same' research	Supported
H3	Claim is materially wrong	Eliminated

Searches¶

ID	Target	Results	Selected
S01	Primary search	10	3

Sources¶

Source	Description	Reliability	Relevance
SRC01	Wharton GAIL — The Decreasing Value of Chain of Thought	High	High