Skip to content

R0027/2026-03-26/Q001/S02/R05

Research R0027 — Multilingual prompt engineering challenges
Run 2026-03-26
Query Q001
Search S02
Result S02-R05

Quantification of tokenization bias as a systematic tax on non-English languages

Summary

Field Value
Title The Token Tax: Systematic Bias in Multilingual Tokenization
URL https://arxiv.org/html/2509.05486v1
Date accessed 2026-03-26
Publication date 2025-09
Author(s) Jessica M. Lundin, Ada Zhang, Nihal Karim, Hamza Louzan, Victor Wei, David Adelani, Cody Carroll
Publication arXiv preprint

Selection Decision

Included in evidence base: Yes

Rationale: Quantifies the structural mechanism (tokenization) behind performance degradation. Shows each additional token per word reduces accuracy by 8-18 percentage points. Critical for understanding root causes.