Skip to content

R0023/2026-03-25/Q001 — ACH Matrix

Matrix

H1: Multiple techniques counterproductive H2: Techniques generally beneficial H3: Effectiveness contingent
SRC01-E01: 58 techniques cataloged via PRISMA N/A + +
SRC02-E01: CoT decreases accuracy in reasoning models ++ -- ++
SRC02-E02: CoT introduces errors on easy questions ++ -- +
SRC03-E01: 9 negative effects from expert personas on MMLU-Pro ++ -- ++
SRC03-E02: Low-knowledge personas reduce accuracy + - +
SRC03-E03: Domain-matched personas provide no benefit + -- +
SRC04-E01: Expert persona 68.0% vs. base 71.6% (independent) ++ -- ++
SRC05-E01: 60-point per-question swings masked by aggregation + - ++

Legend: - ++ Strongly supports - + Supports - -- Strongly contradicts - - Contradicts - N/A Not applicable to this hypothesis

Diagnosticity Analysis

Most Diagnostic Evidence

Evidence ID Why Diagnostic
SRC04-E01 Independent replication of persona failures — discriminates strongly between H2 (edge cases) and H1/H3 (systematic effects)
SRC02-E01 Model-type dependency of CoT — discriminates between H1 (universal harm) and H3 (contingent effects)
SRC05-E01 Aggregation masking — explains why H2 appears plausible from casual testing while being empirically wrong

Least Diagnostic Evidence

Evidence ID Why Non-Diagnostic
SRC01-E01 Taxonomic survey — consistent with all three hypotheses, does not discriminate
SRC03-E02 Low-knowledge persona failure is expected and unsurprising, does not help distinguish H1 from H3

Outcome

Hypothesis supported: H3 — effectiveness is highly contingent on model, task, and context. The evidence consistently shows that the same technique produces different effects across models and conditions.

Hypotheses eliminated: H2 — the evidence is too consistent across independent studies to support the claim that counterproductive findings are mere edge cases.

Hypotheses inconclusive: H1 — partially supported. Multiple techniques are indeed counterproductive, but the counterproductive effects are context-dependent rather than universal, making H3 the more accurate characterization.