R0028/2026-03-26/C015
Claim: A study from Stanford and Berkeley tracked GPT-4's behavior between March and June 2023 and documented accuracy dropping from 84% to 51% on certain tasks in three months.
BLUF: Confirmed. The paper 'How is ChatGPT's behavior changing over time?' by researchers from Stanford and UC Berkeley documented GPT-4's accuracy on prime number identification dropping from 84% to 51% between March and June 2023. The study also found an even more dramatic decline on the same task with chain-of-thought prompting (97.6% to 2.4%).
Probability: Almost certain (95-99%) | Confidence: High
Summary
Hypotheses
| ID |
Hypothesis |
Status |
| H1 |
Claim is accurate — 84% to 51% drop documented |
Supported |
| H2 |
Partially correct — the specific task matters |
Inconclusive |
| H3 |
Claim is materially wrong |
Eliminated |
Searches
| ID |
Target |
Results |
Selected |
| S01 |
Primary search |
10 |
3 |
Sources
| Source |
Description |
Reliability |
Relevance |
| SRC01 |
Chen et al. — How is ChatGPT's behavior changing over time? |
High |
High |