Q002 — ACH Matrix¶


Research	R0027 — Multilingual prompt engineering challenges
Run	2026-03-26
Query	Q002

Matrix¶

	H1: Structure is primary challenge	H2: Computation is primary, structure secondary	H3: Structure causes challenges via computational mediation
SRC01-E01: Linguistic features influence effectiveness	++	-	+
SRC01-E02: Japanese subjects, Arabic gender, Finnish cases	++	-	+
SRC02-E01: Arabic complexity defeats Arabic-centric models	+	+	++
SRC03-E01: Tokenization fertility predicts accuracy (8-18pp)	+	++	++
SRC04-E01: ~2% language nuances, 72-87% model limitations	--	++	++
SRC05-E01: Agglutinative languages break tokenization	+	+	++

Legend:

++ Strongly supports
+ Supports
-- Strongly contradicts
- Contradicts
N/A Not applicable to this hypothesis

Diagnosticity Analysis¶

Most Diagnostic Evidence¶

Evidence ID	Why Diagnostic
SRC04-E01	Strongly contradicts H1 while strongly supporting H2 and H3 — the ~2% vs 72-87% split is the most discriminating finding
SRC02-E01	Supports H3 over H1 — if the challenge were structural, Arabic-centric models should handle Arabic better, but they do not

Least Diagnostic Evidence¶

Evidence ID	Why Non-Diagnostic
SRC01-E01	Supports all three hypotheses to some degree — confirms challenges exist but does not identify the mechanism

Outcome¶

Hypothesis supported: H3 — Linguistic structure causes challenges, but primarily through tokenization and training data mediation

Hypotheses eliminated: None fully eliminated — H1 and H2 each capture part of the picture

Hypotheses inconclusive: H1 (partially supported — challenges exist but mechanism is computational); H2 (partially supported — computation dominates but linguistic structure is the upstream cause)