Skip to content

R0028/2026-03-26/C015/SRC01/E01

Research R0028 — Prompt Engineering Claims
Run 2026-03-26
Claim C015
Source SRC01
Evidence SRC01-E01
Type Factual

Primary evidence supporting the claim assessment.

URL: https://arxiv.org/abs/2307.09009

Extract

Confirmed. The paper 'How is ChatGPT's behavior changing over time?' by researchers from Stanford and UC Berkeley documented GPT-4's accuracy on prime number identification dropping from 84% to 51% between March and June 2023. The study also found an even more dramatic decline on the same task with chain-of-thought prompting (97.6% to 2.4%).

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Supports Direct evidence
H2 Partially supports Direct evidence
H3 Contradicts Evidence contradicts material wrongness

Context

Evidence gathered during 2026-03-26 research run.