S01¶


Research	R0023 — Counterproductive advice and prompt lifecycle
Run	2026-03-25
Query	Q003
Search	S01

WebSearch — Published evidence on prompt degradation over time and model update effects

Summary¶

Field	Value
Source/Database	WebSearch (2 queries combined)
Query terms	(1) prompt degradation over time model updates prompts stop working LLM version changes; (2) Chen Matejka Lingjiao "How is ChatGPT behavior changing over time"
Filters	None
Results returned	20
Results selected	5
Results rejected	15

Selected Results¶

Result	Title	URL	Rationale
S01-R01	How is ChatGPT's behavior changing over time?	https://arxiv.org/abs/2307.09009	Landmark study: GPT-4 accuracy drop from 84% to 51% on prime numbers
S01-R02	Deepchecks: LLM Production Challenges	https://deepchecks.com/llm-production-challenges-prompt-update-incidents/	Industry analysis of prompt-related production incidents
S01-R03	Why Your Production LLM Degrades After 90 Days	https://optimusai.ai/production-llm-90-days-and-how-to-prevent-it/	90-day degradation pattern analysis
S01-R04	Prompt Versioning Best Practices (Latitude)	https://latitude.so/blog/prompt-versioning-best-practices	Industry guidance on prompt version management
S01-R05	Context Rot: Why LLMs Are Getting Dumber	https://labs.adaline.ai/p/context-rot-why-llms-are-getting	Analysis of LLM quality changes over time

Rejected Results¶

Result	Title	URL	Rationale
S01-R06	LLM Application Lifecycle (Applied AI)	https://www.applied-ai.com/briefings/llm-application-lifecycle/	General lifecycle, not focused on degradation evidence
S01-R07	Why AI Output Gets Worse (Medium)	https://medium.com/@theDevDesigns/why-ai-output-sometimes-gets-worse-after-updates	Blog post, no original data
S01-R08	LLM Prompt Lifecycle (base14)	https://docs.base14.io/blog/llm-prompt-lifecycle/	Vendor content, not evidence
S01-R09	AI Model Degrading (314e)	https://www.314e.com/engineering-hub/why-is-my-ai-model-s-performance-degrading	General ML drift, not prompt-specific
S01-R10	LLMOps Drift Monitoring (Fiddler)	https://www.fiddler.ai/blog/how-to-monitor-llmops-performance-with-drift	Vendor product content
S01-R11	Handle LLM Output Failures (Latitude)	https://latitude.so/blog/5-steps-to-handle-llm-output-failures	Practical guidance, not evidence
S01-R12	VentureBeat: ChatGPT behavior changing	https://venturebeat.com/ai/not-just-in-your-head-chatgpts-behavior-is-changing-say-ai-researchers	News coverage of Chen et al. — duplicate of primary source
S01-R13	Tom's Hardware: ChatGPT quality decline	https://www.tomshardware.com/news/chatgpt-response-quality-decline	News coverage, duplicate
S01-R14	ScienceOpen: ChatGPT behavior	https://www.scienceopen.com/document?vid=de297cba-d69f-4c80-934a-a8ac74e7f8dc	Database listing, duplicate
S01-R15	ResearchGate: ChatGPT behavior	https://www.researchgate.net/publication/372445224	Database listing, duplicate
S01-R16	ar5iv HTML version	https://ar5iv.labs.arxiv.org/html/2307.09009	Alternative rendering of S01-R01
S01-R17	ChatGPT behavior (HDSR)	https://hdsr.mitpress.mit.edu/pub/y95zitmz	Journal version of Chen et al.
S01-R18	HuggingFace paper page	https://huggingface.co/papers/2307.09009	Listing, duplicate
S01-R19	AI Scholar summary	https://ai-scholar.tech/en/articles/large-language-models/how-is-chatgpt-behavior-changing-over-time	News summary, secondary
S01-R20	arXiv PDF version	https://arxiv.org/pdf/2307.09009	PDF version of S01-R01

Notes¶

The Chen et al. (2023) study dominated search results, appearing in multiple databases and news outlets. This reflects its status as the landmark study in prompt degradation research. The industry sources (Deepchecks, OptimusAI) provide practitioner perspective but lack rigorous methodology.