Skip to content

R0027/2026-03-26/Q001/SRC06/E01

Research R0027 — Multilingual prompt engineering challenges
Run 2026-03-26
Query Q001
Source SRC06
Evidence SRC06-E01
Type Statistical

English prompts outperform Arabic prompts even on Arabic-centric models

URL: https://arxiv.org/html/2409.07054v1

Extract

"Non-native prompt performs the best, followed by mixed and native prompts" across 197 experiments. Critically, even Jais-13b-chat, an Arabic-centric model, "showed best results with non-native prompts and struggled significantly with native Arabic instructions." GPT-4o showed the smallest gap between prompt languages. Few-shot learning improved performance notably compared to zero-shot approaches.

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Supports Demonstrates clear performance variation between English and Arabic prompts
H2 Contradicts Consistent finding across 3 models and 12 datasets
H3 Supports The gap varies by model — GPT-4o shows minimal difference while Jais struggles significantly

Context

The finding that even an Arabic-centric model performs better with English prompts is a striking result. It suggests the advantage of English is deeply structural — embedded in how models are trained — rather than a simple language capability issue.