R0027/2026-03-26/Q001/SRC06/E01¶
English prompts outperform Arabic prompts even on Arabic-centric models
URL: https://arxiv.org/html/2409.07054v1
Extract¶
"Non-native prompt performs the best, followed by mixed and native prompts" across 197 experiments. Critically, even Jais-13b-chat, an Arabic-centric model, "showed best results with non-native prompts and struggled significantly with native Arabic instructions." GPT-4o showed the smallest gap between prompt languages. Few-shot learning improved performance notably compared to zero-shot approaches.
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Supports | Demonstrates clear performance variation between English and Arabic prompts |
| H2 | Contradicts | Consistent finding across 3 models and 12 datasets |
| H3 | Supports | The gap varies by model — GPT-4o shows minimal difference while Jais struggles significantly |
Context¶
The finding that even an Arabic-centric model performs better with English prompts is a striking result. It suggests the advantage of English is deeply structural — embedded in how models are trained — rather than a simple language capability issue.