Skip to content

R0023/2026-03-25/Q003/SRC01

Landmark Stanford/Berkeley study documenting GPT-4 behavior changes over time

Source

Field Value
Title How is ChatGPT's behavior changing over time?
Publisher arXiv / Harvard Data Science Review
Author(s) Lingjiao Chen, Matei Zaharia, James Zou
Date 2023-07-18 (preprint), 2023-10-31 (final)
URL https://arxiv.org/abs/2307.09009
Type Research paper

Summary

Dimension Rating
Reliability High
Relevance High
Bias: Missing data Low risk
Bias: Measurement Low risk
Bias: Selective reporting Low risk
Bias: Randomization N/A — not an RCT
Bias: Protocol deviation N/A — not an RCT
Bias: COI/Funding Low risk

Rationale

Dimension Rationale
Reliability Stanford and UC Berkeley researchers. Published in Harvard Data Science Review (peer-reviewed). Systematic comparison of same prompts across March and June 2023 versions across 7 task categories.
Relevance The most cited study specifically documenting prompt degradation across model versions. Directly answers Q003.
Bias flags Low risk across the board. Academic researchers with no vendor affiliation. Tested multiple task types rather than cherry-picking.

Evidence Extracts

Evidence ID Summary
SRC01-E01 GPT-4 prime number accuracy dropped from 84% to 51% between March and June 2023
SRC01-E02 Performance changes were mixed — some tasks improved while others degraded