Skip to content

R0027/2026-03-26/Q001/H3

Statement

Performance gaps exist but are highly conditional — varying significantly by language resource level, task type, model architecture, and prompting strategy. The relationship between prompt language and effectiveness is not a simple hierarchy but a multi-dimensional interaction.

Status

Current: Supported

H3 is the best-supported hypothesis. The evidence consistently shows that while performance gaps are real, they are shaped by at least four interacting factors: (1) language resource level in training data, (2) task type (extractive vs generative, reasoning vs understanding), (3) model architecture and scale, and (4) prompting strategy (native, translated, selective, cross-lingual). No single factor explains the full pattern.

Supporting Evidence

Evidence Summary
SRC02-E01 Selective pre-translation outperforms both full translation and native; optimal strategy varies by task
SRC02-E02 200%+ improvement for low-resource languages with right strategy — language resource level matters
SRC01-E02 Native prompts outperform English on sentiment/coreference — task type matters
SRC06-E01 GPT-4o shows minimal gap while Jais struggles — model matters
SRC08-E01 Reasoning models narrow gap by 8-12 points — model type matters
SRC03-E01 XLT reduces gap by 10+ points — prompting strategy matters
SRC05-E01 Performance hierarchy varies: European > East Asian > South Asian — resource level matters

Contradicting Evidence

No evidence directly contradicts H3. All evidence is consistent with a conditional, multi-factor relationship.

Reasoning

H3 is the most faithful representation of the evidence. The performance gap between English and non-English is real (consistent with H1) but cannot be described by a single number or simple rule. The optimal prompting strategy for a given language depends on the task type, the model being used, and the language's resource level. This conditionality is the central finding of the research literature.

Relationship to Other Hypotheses

H3 subsumes H1 — both agree the gap exists, but H3 adds the crucial qualifier that it is conditional. H2 is eliminated. H3 is the preferred hypothesis.