Skip to content

R0028/2026-03-26/C015 — Claim Definition

Claim as Received

A study from Stanford and Berkeley tracked GPT-4's behavior between March and June 2023 and documented accuracy dropping from 84% to 51% on certain tasks in three months.

Claim as Clarified

Confirmed. The paper 'How is ChatGPT's behavior changing over time?' by researchers from Stanford and UC Berkeley documented GPT-4's accuracy on prime number identification dropping from 84% to 51% between March and June 2023. The study also found an even more dramatic decline on the same task with chain-of-thought prompting (97.6% to 2.4%).

BLUF

Confirmed. The paper 'How is ChatGPT's behavior changing over time?' by researchers from Stanford and UC Berkeley documented GPT-4's accuracy on prime number identification dropping from 84% to 51% between March and June 2023. The study also found an even more dramatic decline on the same task with chain-of-thought prompting (97.6% to 2.4%).

Scope

  • Domain: Prompt engineering and related fields
  • Timeframe: As of 2026-03-26
  • Testability: Verifiable through primary sources and published research

Assessment Summary

Probability: Almost certain (95-99%)

Confidence: High

Hypothesis outcome: See assessment.md for full reasoning chain.

[Full assessment in assessment.md.]

Status

Field Value
Date created 2026-03-26
Date completed 2026-03-26
Researcher profile None provided
Prompt version Unified Research Standard v1.0-draft
Revisit by 2027-03-26
Revisit trigger New evidence or source changes