C015 — Claim Definition¶


Research	R0028 — Prompt Engineering Claims
Run	2026-03-26
Claim	C015

Claim as Received¶

A study from Stanford and Berkeley tracked GPT-4's behavior between March and June 2023 and documented accuracy dropping from 84% to 51% on certain tasks in three months.

Claim as Clarified¶

Confirmed. The paper 'How is ChatGPT's behavior changing over time?' by researchers from Stanford and UC Berkeley documented GPT-4's accuracy on prime number identification dropping from 84% to 51% between March and June 2023. The study also found an even more dramatic decline on the same task with chain-of-thought prompting (97.6% to 2.4%).

BLUF¶

Confirmed. The paper 'How is ChatGPT's behavior changing over time?' by researchers from Stanford and UC Berkeley documented GPT-4's accuracy on prime number identification dropping from 84% to 51% between March and June 2023. The study also found an even more dramatic decline on the same task with chain-of-thought prompting (97.6% to 2.4%).

Scope¶

Domain: Prompt engineering and related fields
Timeframe: As of 2026-03-26
Testability: Verifiable through primary sources and published research

Assessment Summary¶

Probability: Almost certain (95-99%)

Confidence: High

Hypothesis outcome: See assessment.md for full reasoning chain.

[Full assessment in assessment.md.]

Status¶

Field	Value
Date created	2026-03-26
Date completed	2026-03-26
Researcher profile	None provided
Prompt version	Unified Research Standard v1.0-draft
Revisit by	2027-03-26
Revisit trigger	New evidence or source changes