Skip to content

R0027/2026-03-26/Q001/SRC05/E01

Research R0027 — Multilingual prompt engineering challenges
Run 2026-03-26
Query Q001
Source SRC05
Evidence SRC05-E01
Type Statistical

30-point English-Swahili performance gap with clear language family hierarchy

URL: https://arxiv.org/html/2503.10497v1

Extract

The best model (Qwen2.5-72B) achieves 70.3% on English but only 40.1% on Swahili — a 30.2-point gap. Performance follows a clear hierarchy: "English > European languages > East Asian languages > South Asian/low-resource languages." Even the best models show 20-30 point drops on low-resource languages. "Reasoning-enhanced training yields inconsistent benefits across languages."

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Supports Provides the most precise quantification of the cross-language gap — up to 30 points
H2 Contradicts Unambiguous, large-magnitude differences across all tested languages
H3 Supports The gap varies by language family (European closer to English, South Asian furthest)

Context

The performance hierarchy (English > European > East Asian > South Asian/low-resource) is consistent across multiple benchmarks, strengthening the finding's reliability.