SRC01¶


Research	R0023 — Counterproductive advice and prompt lifecycle
Run	2026-03-25
Query	Q003
Search	S01
Result	S01-R01
Source	SRC01

Landmark Stanford/Berkeley study documenting GPT-4 behavior changes over time

Source¶

Field	Value
Title	How is ChatGPT's behavior changing over time?
Publisher	arXiv / Harvard Data Science Review
Author(s)	Lingjiao Chen, Matei Zaharia, James Zou
Date	2023-07-18 (preprint), 2023-10-31 (final)
URL	https://arxiv.org/abs/2307.09009
Type	Research paper

Dimension	Rationale
Reliability	Stanford and UC Berkeley researchers. Published in Harvard Data Science Review (peer-reviewed). Systematic comparison of same prompts across March and June 2023 versions across 7 task categories.
Relevance	The most cited study specifically documenting prompt degradation across model versions. Directly answers Q003.
Bias flags	Low risk across the board. Academic researchers with no vendor affiliation. Tested multiple task types rather than cherry-picking.

Evidence ID	Summary
SRC01-E01	GPT-4 prime number accuracy dropped from 84% to 51% between March and June 2023
SRC01-E02	Performance changes were mixed — some tasks improved while others degraded