Skip to content

R0027/2026-03-26/Q001/SRC07

Research R0027 — Multilingual prompt engineering challenges
Run 2026-03-26
Query Q001
Search S02
Result S02-R04
Source SRC07

Gupta et al. — Multilingual LLM performance biases in education

Source

Field Value
Title Multilingual Performance Biases of Large Language Models in Education
Publisher arXiv (ETH Zurich / Bocconi)
Author(s) Vansh Gupta, Sankalan Pal Chowdhury, Vilem Zouhar, Donya Rooein, Mrinmaya Sachan
Date 2025-04
URL https://arxiv.org/html/2504.17720v2
Type Research paper

Summary

Dimension Rating
Reliability High
Relevance High
Bias: Missing data Low risk
Bias: Measurement Low risk
Bias: Selective reporting Low risk
Bias: Randomization N/A — not an RCT
Bias: Protocol deviation N/A — not an RCT
Bias: COI/Funding Low risk

Rationale

Dimension Rationale
Reliability ETH Zurich affiliation. 313,500 model outputs. Covers Hindi, Mandarin, Arabic — all named in Q001.
Relevance Provides per-language accuracy data for the specific languages the query asks about. Includes prompt language comparison.
Bias flags Low risk. Large dataset, multiple models, transparent methodology.

Evidence Extracts

Evidence ID Summary
SRC07-E01 Per-language accuracy data: English 70.9%, Hindi 63.1%, Mandarin 64.6%, Arabic 67.4%
SRC07-E02 English prompts outperform translated prompts: 72.7% vs 67.2%