R0023/2026-03-25/Q001/S02/R01¶
Wharton GAIL study on the decreasing effectiveness of chain-of-thought prompting across models.
Summary¶
| Field | Value |
|---|---|
| Title | Prompting Science Report 2: The Decreasing Value of Chain of Thought in Prompting |
| URL | https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5285532 |
| Date accessed | 2026-03-25 |
| Publication date | 2025-06-08 |
| Author(s) | Lennart Meincke, Ethan R. Mollick, Lilach Mollick, Dan Shapiro |
| Publication | SSRN / Wharton Generative AI Labs |
Selection Decision¶
Included in evidence base: Yes
Rationale: Primary empirical study with rigorous methodology (GPQA Diamond, 198 questions, 25 trials per condition, 8 models tested). Directly demonstrates that CoT can hurt performance in reasoning models and introduces variability that causes errors on previously-correct questions.