Skip to content

R0023/2026-03-25/Q001/S02/R01

Wharton GAIL study on the decreasing effectiveness of chain-of-thought prompting across models.

Summary

Field Value
Title Prompting Science Report 2: The Decreasing Value of Chain of Thought in Prompting
URL https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5285532
Date accessed 2026-03-25
Publication date 2025-06-08
Author(s) Lennart Meincke, Ethan R. Mollick, Lilach Mollick, Dan Shapiro
Publication SSRN / Wharton Generative AI Labs

Selection Decision

Included in evidence base: Yes

Rationale: Primary empirical study with rigorous methodology (GPQA Diamond, 198 questions, 25 trials per condition, 8 models tested). Directly demonstrates that CoT can hurt performance in reasoning models and introduces variability that causes errors on previously-correct questions.