Q001 — ACH Matrix¶


Research	R0020 — Prompt Engineering Gaps
Run	2026-03-25
Query	Q001

Matrix¶

	H1: Substantial mature ecosystem	H2: No meaningful frameworks	H3: Emerging but immature
SRC01-E01: Six dedicated testing frameworks	++	--	+
SRC01-E02: Non-determinism and subjectivity challenges	-	N/A	++
SRC02-E01: Seven frameworks, six quality dimensions	++	--	+
SRC02-E02: Testing-to-production gap	--	N/A	++
SRC03-E01: Three-tier testing methodology taxonomy	+	--	++
SRC03-E02: Unsolved challenges (unpredictability, scalability)	-	N/A	++
SRC04-E01: Golden datasets, noise mitigation, CI/CD	+	--	++

Legend: - ++ Strongly supports - + Supports - -- Strongly contradicts - - Contradicts - N/A Not applicable to this hypothesis

Diagnosticity Analysis¶

Most Diagnostic Evidence¶

Evidence ID	Why Diagnostic
SRC02-E02	Testing-to-production gap is uniquely diagnostic: it strongly supports H3 while strongly contradicting H1. A mature ecosystem would not have this systemic failure mode.
SRC04-E01	The need for 3-5 trials with confidence intervals discriminates between H1 (mature) and H3 (emerging) by revealing the statistical nature of prompt evaluation.

Least Diagnostic Evidence¶

Evidence ID	Why Non-Diagnostic
SRC01-E01	Tool existence supports both H1 and H3; does not discriminate between them
SRC02-E01	Quality dimensions support both H1 and H3 equally

Outcome¶

Hypothesis supported: H3 — Testing tools exist but the field is emerging, with fundamental challenges (non-determinism, testing-to-production gap, statistical rather than deterministic verification) that prevent characterization as a mature ecosystem.

Hypotheses eliminated: H2 — Multiple dedicated tools with CI/CD integration, regression testing, and structured evaluation eliminate the "no meaningful frameworks" hypothesis.

Hypotheses inconclusive: H1 — Tools exist (supporting partial credit) but the acknowledged limitations and systemic gaps prevent full support.