R0027/2026-03-26/Q001/SRC04/E01¶
High-resource languages consistently outperform low-resource; model scaling does not close the gap
URL: https://arxiv.org/html/2502.07346v1
Extract¶
"High-resource languages such as French and Chinese consistently outperform low-resource languages like Telugu, Swahili, and Bengali." DeepSeek-V3 showed 50%+ accuracy in science reasoning for English/French but dropped below 40% for Telugu. Critically, "the proportion of larger models achieving smaller GAPs only slightly exceeds 0.5 for most model families" — meaning model size increases do not reliably reduce the cross-language performance gap.
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Supports | 10+ percentage point gaps confirmed across 17 languages |
| H2 | Contradicts | Clear, consistent performance hierarchy documented |
| H3 | Supports | Gap magnitude varies by language resource level and task type |
Context¶
The finding that model scaling does not close the gap is significant — it suggests the problem is structural (training data, tokenization) rather than simply a matter of model capacity.