R0028/2026-03-26/C015/SRC01/E01¶
Primary evidence supporting the claim assessment.
URL: https://arxiv.org/abs/2307.09009
Extract¶
Confirmed. The paper 'How is ChatGPT's behavior changing over time?' by researchers from Stanford and UC Berkeley documented GPT-4's accuracy on prime number identification dropping from 84% to 51% between March and June 2023. The study also found an even more dramatic decline on the same task with chain-of-thought prompting (97.6% to 2.4%).
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Supports | Direct evidence |
| H2 | Partially supports | Direct evidence |
| H3 | Contradicts | Evidence contradicts material wrongness |
Context¶
Evidence gathered during 2026-03-26 research run.