Skip to content

R0027/2026-03-26/Q001/S02/R02

Research R0027 — Multilingual prompt engineering challenges
Run 2026-03-26
Query Q001
Search S02
Result S02-R02

13-language multilingual MMLU benchmark with 11,829 questions per language

Summary

Field Value
Title MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
URL https://arxiv.org/html/2503.10497v1
Date accessed 2026-03-26
Publication date 2025-03-13
Author(s) Weihao Xuan, Rui Yang, et al.
Publication arXiv preprint

Selection Decision

Included in evidence base: Yes

Rationale: Largest standardized cross-linguistic benchmark found. 11,829 identical questions across 13 languages enables direct performance comparison. Documents 30-point English-Swahili gap.