R0027/2026-03-26/Q001/S02/R02¶
13-language multilingual MMLU benchmark with 11,829 questions per language
Summary¶
| Field | Value |
|---|---|
| Title | MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation |
| URL | https://arxiv.org/html/2503.10497v1 |
| Date accessed | 2026-03-26 |
| Publication date | 2025-03-13 |
| Author(s) | Weihao Xuan, Rui Yang, et al. |
| Publication | arXiv preprint |
Selection Decision¶
Included in evidence base: Yes
Rationale: Largest standardized cross-linguistic benchmark found. 11,829 identical questions across 13 languages enables direct performance comparison. Documents 30-point English-Swahili gap.