SRC02¶


Research	R0023 — Counterproductive advice and prompt lifecycle
Run	2026-03-25
Query	Q001
Search	S02
Result	S02-R01
Source	SRC02

Wharton GAIL study on the decreasing value of chain-of-thought prompting

Source¶

Field	Value
Title	Prompting Science Report 2: The Decreasing Value of Chain of Thought in Prompting
Publisher	SSRN / Wharton Generative AI Labs
Author(s)	Lennart Meincke, Ethan R. Mollick, Lilach Mollick, Dan Shapiro
Date	2025-06-08
URL	https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5285532
Type	Research paper (technical report)

Dimension	Rationale
Reliability	Rigorous methodology: GPQA Diamond (198 PhD-level questions), 25 trials per condition, 8 models tested across reasoning and non-reasoning categories. Multiple correctness thresholds (100%, 90%, 51%, average). Wharton institutional affiliation.
Relevance	Directly addresses whether chain-of-thought — arguably the most widely recommended prompt technique — can be counterproductive. High relevance to Q001.
Bias flags	Low risk. Uses established benchmarks, tests both positive and negative outcomes, reports all results including where CoT helps. Not funded by any AI vendor.

Evidence ID	Summary
SRC02-E01	CoT decreases perfect accuracy in reasoning models (Gemini Flash 2.5: -13.1% at 100% threshold)
SRC02-E02	CoT introduces variability causing errors on easy questions the model would otherwise answer correctly