Q003 — ACH Matrix¶


Research	R0023 — Counterproductive advice and prompt lifecycle
Run	2026-03-25
Query	Q003

Matrix¶

	H1: Strong evidence of degradation	H2: Sparse/anecdotal evidence	H3: Complex mixed effects
SRC01-E01: GPT-4 84% to 51% accuracy drop	++	--	+
SRC01-E02: Mixed effects across task types	-	N/A	++
SRC02-E01: Stochastic variation in same model	-	+	++
SRC03-E01: Industry claims without data	+	+	N/A

Legend: - ++ Strongly supports - + Supports - -- Strongly contradicts - - Contradicts - N/A Not applicable to this hypothesis

Diagnosticity Analysis¶

Most Diagnostic Evidence¶

Evidence ID	Why Diagnostic
SRC01-E02	Mixed effects across tasks discriminates between H1 (uniform degradation) and H3 (complex reality)
SRC02-E01	Stochastic variation discriminates between H1 (all degradation is real) and H3 (some may be noise)

Least Diagnostic Evidence¶

Evidence ID	Why Non-Diagnostic
SRC03-E01	Industry claims without data support both H1 and H2 depending on interpretation

Outcome¶

Hypothesis supported: H3 — the evidence shows mixed effects that make "degradation" an oversimplification.

Hypotheses eliminated: None fully eliminated.

Hypotheses inconclusive: H1 (partially supported — the phenomenon is real but the evidence base is narrow) and H2 (partially supported — the evidence IS sparse beyond one study, but that one study is rigorous).