Skip to content

R0028/2026-03-26/C013/SRC01/E01

Research R0028 — Prompt Engineering Claims
Run 2026-03-26
Claim C013
Source SRC01
Evidence SRC01-E01
Type Factual

Primary evidence supporting the claim assessment.

URL: https://gail.wharton.upenn.edu/research-and-insights/tech-report-chain-of-thought/

Extract

Partially correct. GAIL's research (a separate report, not the persona report) found that CoT prompting provides minimal benefits for reasoning models (2.9-3.1% average improvement for o3-mini and o4-mini) with substantial time costs (20-80% increase), and Gemini Flash 2.5 actually showed performance decreases (-13.1% at 100% threshold). However, this was published as a separate technical report dated June 2025, not part of the persona study, and not presented at EMNLP 2024.

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Partially supports Direct evidence
H2 Supports Direct evidence
H3 Contradicts Evidence contradicts material wrongness

Context

Evidence gathered during 2026-03-26 research run.