E01¶


Research	R0027 — Multilingual prompt engineering challenges
Run	2026-03-26
Query	Q001
Source	SRC07
Evidence	SRC07-E01
Type	Statistical

Per-language accuracy data for Hindi, Mandarin, and Arabic — all below English

URL: https://arxiv.org/html/2504.17720v2

Extract¶

Average performance across educational tasks: English 70.9%, Arabic 67.4%, German 66.8%, Farsi 66.2%, Mandarin 64.6%, Hindi 63.1%, Czech 55.3%, Telugu 49.7%. Performance correlates with CommonCrawl representation: Telugu (0.02% of CommonCrawl) showed the poorest results.

Relevance to Hypotheses¶

Hypothesis	Relationship	Strength
H1	Supports	Quantifies the gap for the specific languages asked about in Q001
H2	Contradicts	Clear numerical differences across all tested languages
H3	Supports	Gap magnitude varies: Arabic is closer to English (3.5pp) than Hindi (7.8pp) or Telugu (21.2pp)

Context¶

This study is particularly valuable because it tests Hindi, Mandarin, and Arabic — three of the four languages specifically named in Q001.