R0028/2026-03-26/C013/SRC01/E01¶
Primary evidence supporting the claim assessment.
URL: https://gail.wharton.upenn.edu/research-and-insights/tech-report-chain-of-thought/
Extract¶
Partially correct. GAIL's research (a separate report, not the persona report) found that CoT prompting provides minimal benefits for reasoning models (2.9-3.1% average improvement for o3-mini and o4-mini) with substantial time costs (20-80% increase), and Gemini Flash 2.5 actually showed performance decreases (-13.1% at 100% threshold). However, this was published as a separate technical report dated June 2025, not part of the persona study, and not presented at EMNLP 2024.
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Partially supports | Direct evidence |
| H2 | Supports | Direct evidence |
| H3 | Contradicts | Evidence contradicts material wrongness |
Context¶
Evidence gathered during 2026-03-26 research run.