R0027/2026-03-26/Q001/H3¶
Statement¶
Performance gaps exist but are highly conditional — varying significantly by language resource level, task type, model architecture, and prompting strategy. The relationship between prompt language and effectiveness is not a simple hierarchy but a multi-dimensional interaction.
Status¶
Current: Supported
H3 is the best-supported hypothesis. The evidence consistently shows that while performance gaps are real, they are shaped by at least four interacting factors: (1) language resource level in training data, (2) task type (extractive vs generative, reasoning vs understanding), (3) model architecture and scale, and (4) prompting strategy (native, translated, selective, cross-lingual). No single factor explains the full pattern.
Supporting Evidence¶
| Evidence | Summary |
|---|---|
| SRC02-E01 | Selective pre-translation outperforms both full translation and native; optimal strategy varies by task |
| SRC02-E02 | 200%+ improvement for low-resource languages with right strategy — language resource level matters |
| SRC01-E02 | Native prompts outperform English on sentiment/coreference — task type matters |
| SRC06-E01 | GPT-4o shows minimal gap while Jais struggles — model matters |
| SRC08-E01 | Reasoning models narrow gap by 8-12 points — model type matters |
| SRC03-E01 | XLT reduces gap by 10+ points — prompting strategy matters |
| SRC05-E01 | Performance hierarchy varies: European > East Asian > South Asian — resource level matters |
Contradicting Evidence¶
No evidence directly contradicts H3. All evidence is consistent with a conditional, multi-factor relationship.
Reasoning¶
H3 is the most faithful representation of the evidence. The performance gap between English and non-English is real (consistent with H1) but cannot be described by a single number or simple rule. The optimal prompting strategy for a given language depends on the task type, the model being used, and the language's resource level. This conditionality is the central finding of the research literature.
Relationship to Other Hypotheses¶
H3 subsumes H1 — both agree the gap exists, but H3 adds the crucial qualifier that it is conditional. H2 is eliminated. H3 is the preferred hypothesis.