Skip to content

R0028/2026-03-26/C015

Claim: A study from Stanford and Berkeley tracked GPT-4's behavior between March and June 2023 and documented accuracy dropping from 84% to 51% on certain tasks in three months.

BLUF: Confirmed. The paper 'How is ChatGPT's behavior changing over time?' by researchers from Stanford and UC Berkeley documented GPT-4's accuracy on prime number identification dropping from 84% to 51% between March and June 2023. The study also found an even more dramatic decline on the same task with chain-of-thought prompting (97.6% to 2.4%).

Probability: Almost certain (95-99%) | Confidence: High


Summary

Entity Description
Claim Definition Claim text, scope, status
Assessment Full analytical product with reasoning chain
ACH Matrix Evidence x hypotheses diagnosticity analysis
Self-Audit ROBIS-adapted 4-domain process audit

Hypotheses

ID Hypothesis Status
H1 Claim is accurate — 84% to 51% drop documented Supported
H2 Partially correct — the specific task matters Inconclusive
H3 Claim is materially wrong Eliminated

Searches

ID Target Results Selected
S01 Primary search 10 3

Sources

Source Description Reliability Relevance
SRC01 Chen et al. — How is ChatGPT's behavior changing over time? High High