R0028/2026-03-26/C024/SRC01/E01¶
Primary evidence supporting the claim assessment.
URL: https://arxiv.org/html/2509.05486v1
Extract¶
Confirmed. The 'token tax' is well-documented in published research. A paper titled 'The Token Tax: Systematic Bias in Multilingual Tokenization' (arXiv, 2025) directly uses this term. Research shows Arabic requires approximately 3x more tokens than English for equivalent text. A doubling in tokens results in quadrupled training cost, and higher token fertility consistently predicts lower accuracy.
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Supports | Direct evidence |
| H2 | Partially supports | Direct evidence |
| H3 | Contradicts | Evidence contradicts material wrongness |
Context¶
Evidence gathered 2026-03-26.