Skip to content

R0020/2026-03-25/Q001 — ACH Matrix

Matrix

H1: Substantial mature ecosystem H2: No meaningful frameworks H3: Emerging but immature
SRC01-E01: Six dedicated testing frameworks ++ -- +
SRC01-E02: Non-determinism and subjectivity challenges - N/A ++
SRC02-E01: Seven frameworks, six quality dimensions ++ -- +
SRC02-E02: Testing-to-production gap -- N/A ++
SRC03-E01: Three-tier testing methodology taxonomy + -- ++
SRC03-E02: Unsolved challenges (unpredictability, scalability) - N/A ++
SRC04-E01: Golden datasets, noise mitigation, CI/CD + -- ++

Legend: - ++ Strongly supports - + Supports - -- Strongly contradicts - - Contradicts - N/A Not applicable to this hypothesis

Diagnosticity Analysis

Most Diagnostic Evidence

Evidence ID Why Diagnostic
SRC02-E02 Testing-to-production gap is uniquely diagnostic: it strongly supports H3 while strongly contradicting H1. A mature ecosystem would not have this systemic failure mode.
SRC04-E01 The need for 3-5 trials with confidence intervals discriminates between H1 (mature) and H3 (emerging) by revealing the statistical nature of prompt evaluation.

Least Diagnostic Evidence

Evidence ID Why Non-Diagnostic
SRC01-E01 Tool existence supports both H1 and H3; does not discriminate between them
SRC02-E01 Quality dimensions support both H1 and H3 equally

Outcome

Hypothesis supported: H3 — Testing tools exist but the field is emerging, with fundamental challenges (non-determinism, testing-to-production gap, statistical rather than deterministic verification) that prevent characterization as a mature ecosystem.

Hypotheses eliminated: H2 — Multiple dedicated tools with CI/CD integration, regression testing, and structured evaluation eliminate the "no meaningful frameworks" hypothesis.

Hypotheses inconclusive: H1 — Tools exist (supporting partial credit) but the acknowledged limitations and systemic gaps prevent full support.