Skip to content

R0023/2026-03-25/Q003/S01

WebSearch — Published evidence on prompt degradation over time and model update effects

Summary

Field Value
Source/Database WebSearch (2 queries combined)
Query terms (1) prompt degradation over time model updates prompts stop working LLM version changes; (2) Chen Matejka Lingjiao "How is ChatGPT behavior changing over time"
Filters None
Results returned 20
Results selected 5
Results rejected 15

Selected Results

Result Title URL Rationale
S01-R01 How is ChatGPT's behavior changing over time? https://arxiv.org/abs/2307.09009 Landmark study: GPT-4 accuracy drop from 84% to 51% on prime numbers
S01-R02 Deepchecks: LLM Production Challenges https://deepchecks.com/llm-production-challenges-prompt-update-incidents/ Industry analysis of prompt-related production incidents
S01-R03 Why Your Production LLM Degrades After 90 Days https://optimusai.ai/production-llm-90-days-and-how-to-prevent-it/ 90-day degradation pattern analysis
S01-R04 Prompt Versioning Best Practices (Latitude) https://latitude.so/blog/prompt-versioning-best-practices Industry guidance on prompt version management
S01-R05 Context Rot: Why LLMs Are Getting Dumber https://labs.adaline.ai/p/context-rot-why-llms-are-getting Analysis of LLM quality changes over time

Rejected Results

Result Title URL Rationale
S01-R06 LLM Application Lifecycle (Applied AI) https://www.applied-ai.com/briefings/llm-application-lifecycle/ General lifecycle, not focused on degradation evidence
S01-R07 Why AI Output Gets Worse (Medium) https://medium.com/@theDevDesigns/why-ai-output-sometimes-gets-worse-after-updates Blog post, no original data
S01-R08 LLM Prompt Lifecycle (base14) https://docs.base14.io/blog/llm-prompt-lifecycle/ Vendor content, not evidence
S01-R09 AI Model Degrading (314e) https://www.314e.com/engineering-hub/why-is-my-ai-model-s-performance-degrading General ML drift, not prompt-specific
S01-R10 LLMOps Drift Monitoring (Fiddler) https://www.fiddler.ai/blog/how-to-monitor-llmops-performance-with-drift Vendor product content
S01-R11 Handle LLM Output Failures (Latitude) https://latitude.so/blog/5-steps-to-handle-llm-output-failures Practical guidance, not evidence
S01-R12 VentureBeat: ChatGPT behavior changing https://venturebeat.com/ai/not-just-in-your-head-chatgpts-behavior-is-changing-say-ai-researchers News coverage of Chen et al. — duplicate of primary source
S01-R13 Tom's Hardware: ChatGPT quality decline https://www.tomshardware.com/news/chatgpt-response-quality-decline News coverage, duplicate
S01-R14 ScienceOpen: ChatGPT behavior https://www.scienceopen.com/document?vid=de297cba-d69f-4c80-934a-a8ac74e7f8dc Database listing, duplicate
S01-R15 ResearchGate: ChatGPT behavior https://www.researchgate.net/publication/372445224 Database listing, duplicate
S01-R16 ar5iv HTML version https://ar5iv.labs.arxiv.org/html/2307.09009 Alternative rendering of S01-R01
S01-R17 ChatGPT behavior (HDSR) https://hdsr.mitpress.mit.edu/pub/y95zitmz Journal version of Chen et al.
S01-R18 HuggingFace paper page https://huggingface.co/papers/2307.09009 Listing, duplicate
S01-R19 AI Scholar summary https://ai-scholar.tech/en/articles/large-language-models/how-is-chatgpt-behavior-changing-over-time News summary, secondary
S01-R20 arXiv PDF version https://arxiv.org/pdf/2307.09009 PDF version of S01-R01

Notes

The Chen et al. (2023) study dominated search results, appearing in multiple databases and news outlets. This reflects its status as the landmark study in prompt degradation research. The industry sources (Deepchecks, OptimusAI) provide practitioner perspective but lack rigorous methodology.