E01¶


Research	R0028 — Prompt Engineering Claims
Run	2026-03-26
Claim	C015
Source	SRC01
Evidence	SRC01-E01
Type	Factual

Primary evidence supporting the claim assessment.

URL: https://arxiv.org/abs/2307.09009

Extract¶

Confirmed. The paper 'How is ChatGPT's behavior changing over time?' by researchers from Stanford and UC Berkeley documented GPT-4's accuracy on prime number identification dropping from 84% to 51% between March and June 2023. The study also found an even more dramatic decline on the same task with chain-of-thought prompting (97.6% to 2.4%).

Relevance to Hypotheses¶

Hypothesis	Relationship	Strength
H1	Supports	Direct evidence
H2	Partially supports	Direct evidence
H3	Contradicts	Evidence contradicts material wrongness

Context¶

Evidence gathered during 2026-03-26 research run.