Skip to content

R0027/2026-03-26/Q001/SRC07/E01

Research R0027 — Multilingual prompt engineering challenges
Run 2026-03-26
Query Q001
Source SRC07
Evidence SRC07-E01
Type Statistical

Per-language accuracy data for Hindi, Mandarin, and Arabic — all below English

URL: https://arxiv.org/html/2504.17720v2

Extract

Average performance across educational tasks: English 70.9%, Arabic 67.4%, German 66.8%, Farsi 66.2%, Mandarin 64.6%, Hindi 63.1%, Czech 55.3%, Telugu 49.7%. Performance correlates with CommonCrawl representation: Telugu (0.02% of CommonCrawl) showed the poorest results.

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Supports Quantifies the gap for the specific languages asked about in Q001
H2 Contradicts Clear numerical differences across all tested languages
H3 Supports Gap magnitude varies: Arabic is closer to English (3.5pp) than Hindi (7.8pp) or Telugu (21.2pp)

Context

This study is particularly valuable because it tests Hindi, Mandarin, and Arabic — three of the four languages specifically named in Q001.