E01¶


Research	R0027 — Multilingual prompt engineering challenges
Run	2026-03-26
Query	Q001
Source	SRC04
Evidence	SRC04-E01
Type	Statistical

High-resource languages consistently outperform low-resource; model scaling does not close the gap

URL: https://arxiv.org/html/2502.07346v1

Extract¶

"High-resource languages such as French and Chinese consistently outperform low-resource languages like Telugu, Swahili, and Bengali." DeepSeek-V3 showed 50%+ accuracy in science reasoning for English/French but dropped below 40% for Telugu. Critically, "the proportion of larger models achieving smaller GAPs only slightly exceeds 0.5 for most model families" — meaning model size increases do not reliably reduce the cross-language performance gap.

Relevance to Hypotheses¶

Hypothesis	Relationship	Strength
H1	Supports	10+ percentage point gaps confirmed across 17 languages
H2	Contradicts	Clear, consistent performance hierarchy documented
H3	Supports	Gap magnitude varies by language resource level and task type

Context¶

The finding that model scaling does not close the gap is significant — it suggests the problem is structural (training data, tokenization) rather than simply a matter of model capacity.