R0028/2026-03-26/C015 — Claim Definition¶
Claim as Received¶
A study from Stanford and Berkeley tracked GPT-4's behavior between March and June 2023 and documented accuracy dropping from 84% to 51% on certain tasks in three months.
Claim as Clarified¶
Confirmed. The paper 'How is ChatGPT's behavior changing over time?' by researchers from Stanford and UC Berkeley documented GPT-4's accuracy on prime number identification dropping from 84% to 51% between March and June 2023. The study also found an even more dramatic decline on the same task with chain-of-thought prompting (97.6% to 2.4%).
BLUF¶
Confirmed. The paper 'How is ChatGPT's behavior changing over time?' by researchers from Stanford and UC Berkeley documented GPT-4's accuracy on prime number identification dropping from 84% to 51% between March and June 2023. The study also found an even more dramatic decline on the same task with chain-of-thought prompting (97.6% to 2.4%).
Scope¶
- Domain: Prompt engineering and related fields
- Timeframe: As of 2026-03-26
- Testability: Verifiable through primary sources and published research
Assessment Summary¶
Probability: Almost certain (95-99%)
Confidence: High
Hypothesis outcome: See assessment.md for full reasoning chain.
[Full assessment in assessment.md.]
Status¶
| Field | Value |
|---|---|
| Date created | 2026-03-26 |
| Date completed | 2026-03-26 |
| Researcher profile | None provided |
| Prompt version | Unified Research Standard v1.0-draft |
| Revisit by | 2027-03-26 |
| Revisit trigger | New evidence or source changes |