R0027/2026-03-26/Q001 — ACH Matrix¶
Matrix¶
| H1: Significant gap exists | H2: No meaningful gap | H3: Conditional gap | |
|---|---|---|---|
| SRC01-E01: 36 papers studying multilingual prompting | + | -- | + |
| SRC01-E02: Native prompts win on some tasks | - | N/A | ++ |
| SRC02-E01: Selective pre-translation outperforms | + | -- | ++ |
| SRC02-E02: 200%+ improvement for low-resource | + | -- | ++ |
| SRC03-E01: XLT achieves 10+ point improvement | + | -- | ++ |
| SRC04-E01: Scaling does not close gap | ++ | -- | + |
| SRC05-E01: 30-point English-Swahili gap | ++ | -- | + |
| SRC06-E01: English beats Arabic on Arabic model | ++ | -- | + |
| SRC07-E01: Hindi 63.1%, Mandarin 64.6%, Arabic 67.4% vs English 70.9% | ++ | -- | + |
| SRC07-E02: English prompts 72.7% vs translated 67.2% | ++ | -- | + |
| SRC08-E01: 8-18pp per token/word; reasoning models narrow gap | + | -- | ++ |
Legend:
++Strongly supports+Supports--Strongly contradicts-ContradictsN/ANot applicable to this hypothesis
Diagnosticity Analysis¶
Most Diagnostic Evidence¶
| Evidence ID | Why Diagnostic |
|---|---|
| SRC01-E02 | Contradicts H1 (simple gap) while supporting H3 (conditional) — discriminates between H1 and H3 |
| SRC08-E01 | Supports both H1 and H3 differently — the gap exists (H1) but reasoning models narrow it (H3) |
| SRC02-E01 | Selective pre-translation creating task-dependent optimal strategies discriminates H1 from H3 |
Least Diagnostic Evidence¶
| Evidence ID | Why Non-Diagnostic |
|---|---|
| SRC01-E01 | Supports both H1 and H3 equally — confirms research exists but does not discriminate |
| SRC04-E01 | Supports H1 and weakly supports H3 — provides magnitude but not conditionality detail |
Outcome¶
Hypothesis supported: H3 — The performance gap is real but conditional on language resource level, task type, model architecture, and prompting strategy
Hypotheses eliminated: H2 — No evidence supports equivalent performance across languages; every source contradicts it
Hypotheses inconclusive: H1 — Partially supported but oversimplifies; the gap is not a simple, uniform degradation