R0027/2026-03-26/Q001/SRC07
Gupta et al. — Multilingual LLM performance biases in education
Source
| Field |
Value |
| Title |
Multilingual Performance Biases of Large Language Models in Education |
| Publisher |
arXiv (ETH Zurich / Bocconi) |
| Author(s) |
Vansh Gupta, Sankalan Pal Chowdhury, Vilem Zouhar, Donya Rooein, Mrinmaya Sachan |
| Date |
2025-04 |
| URL |
https://arxiv.org/html/2504.17720v2 |
| Type |
Research paper |
Summary
| Dimension |
Rating |
| Reliability |
High |
| Relevance |
High |
| Bias: Missing data |
Low risk |
| Bias: Measurement |
Low risk |
| Bias: Selective reporting |
Low risk |
| Bias: Randomization |
N/A — not an RCT |
| Bias: Protocol deviation |
N/A — not an RCT |
| Bias: COI/Funding |
Low risk |
Rationale
| Dimension |
Rationale |
| Reliability |
ETH Zurich affiliation. 313,500 model outputs. Covers Hindi, Mandarin, Arabic — all named in Q001. |
| Relevance |
Provides per-language accuracy data for the specific languages the query asks about. Includes prompt language comparison. |
| Bias flags |
Low risk. Large dataset, multiple models, transparent methodology. |
| Evidence ID |
Summary |
| SRC07-E01 |
Per-language accuracy data: English 70.9%, Hindi 63.1%, Mandarin 64.6%, Arabic 67.4% |
| SRC07-E02 |
English prompts outperform translated prompts: 72.7% vs 67.2% |