Q001 — ACH Matrix¶


Research	R0023 — Counterproductive advice and prompt lifecycle
Run	2026-03-25
Query	Q001

Matrix¶

	H1: Multiple techniques counterproductive	H2: Techniques generally beneficial	H3: Effectiveness contingent
SRC01-E01: 58 techniques cataloged via PRISMA	N/A	+	+
SRC02-E01: CoT decreases accuracy in reasoning models	++	--	++
SRC02-E02: CoT introduces errors on easy questions	++	--	+
SRC03-E01: 9 negative effects from expert personas on MMLU-Pro	++	--	++
SRC03-E02: Low-knowledge personas reduce accuracy	+	-	+
SRC03-E03: Domain-matched personas provide no benefit	+	--	+
SRC04-E01: Expert persona 68.0% vs. base 71.6% (independent)	++	--	++
SRC05-E01: 60-point per-question swings masked by aggregation	+	-	++

Legend: - ++ Strongly supports - + Supports - -- Strongly contradicts - - Contradicts - N/A Not applicable to this hypothesis

Diagnosticity Analysis¶

Most Diagnostic Evidence¶

Evidence ID	Why Diagnostic
SRC04-E01	Independent replication of persona failures — discriminates strongly between H2 (edge cases) and H1/H3 (systematic effects)
SRC02-E01	Model-type dependency of CoT — discriminates between H1 (universal harm) and H3 (contingent effects)
SRC05-E01	Aggregation masking — explains why H2 appears plausible from casual testing while being empirically wrong

Least Diagnostic Evidence¶

Evidence ID	Why Non-Diagnostic
SRC01-E01	Taxonomic survey — consistent with all three hypotheses, does not discriminate
SRC03-E02	Low-knowledge persona failure is expected and unsurprising, does not help distinguish H1 from H3

Outcome¶

Hypothesis supported: H3 — effectiveness is highly contingent on model, task, and context. The evidence consistently shows that the same technique produces different effects across models and conditions.

Hypotheses eliminated: H2 — the evidence is too consistent across independent studies to support the claim that counterproductive findings are mere edge cases.

Hypotheses inconclusive: H1 — partially supported. Multiple techniques are indeed counterproductive, but the counterproductive effects are context-dependent rather than universal, making H3 the more accurate characterization.