R0028/2026-03-26/C024
Claim: Non-English languages pay a "token tax": more tokens are required to express the same meaning.
BLUF: Confirmed. The 'token tax' is well-documented in published research. A paper titled 'The Token Tax: Systematic Bias in Multilingual Tokenization' (arXiv, 2025) directly uses this term. Research shows Arabic requires approximately 3x more tokens than English for equivalent text. A doubling in tokens results in quadrupled training cost, and higher token fertility consistently predicts lower accuracy.
Probability: Almost certain (95-99%) | Confidence: High
Summary
Hypotheses
| ID |
Hypothesis |
Status |
| H1 |
Claim is accurate — token tax is documented |
Supported |
| H2 |
Partially correct — degree varies by language |
Inconclusive |
| H3 |
Claim is materially wrong |
Eliminated |
Searches
| ID |
Target |
Results |
Selected |
| S01 |
Primary search |
10 |
3 |
Sources
| Source |
Description |
Reliability |
Relevance |
| SRC01 |
The Token Tax: Systematic Bias in Multilingual Tokenization (arXiv) |
High |
High |