R0028/2026-03-26/C024 — Claim Definition¶
Claim as Received¶
Non-English languages pay a "token tax": more tokens are required to express the same meaning.
Claim as Clarified¶
Confirmed. The 'token tax' is well-documented in published research. A paper titled 'The Token Tax: Systematic Bias in Multilingual Tokenization' (arXiv, 2025) directly uses this term. Research shows Arabic requires approximately 3x more tokens than English for equivalent text. A doubling in tokens results in quadrupled training cost, and higher token fertility consistently predicts lower accuracy.
BLUF¶
Confirmed. The 'token tax' is well-documented in published research. A paper titled 'The Token Tax: Systematic Bias in Multilingual Tokenization' (arXiv, 2025) directly uses this term. Research shows Arabic requires approximately 3x more tokens than English for equivalent text. A doubling in tokens results in quadrupled training cost, and higher token fertility consistently predicts lower accuracy.
Scope¶
- Domain: Prompt engineering and related fields
- Timeframe: As of 2026-03-26
- Testability: Verifiable through primary sources
Assessment Summary¶
Probability: Almost certain (95-99%)
Confidence: High
Hypothesis outcome: See assessment.md.
[Full assessment in assessment.md.]
Status¶
| Field | Value |
|---|---|
| Date created | 2026-03-26 |
| Date completed | 2026-03-26 |
| Researcher profile | None provided |
| Prompt version | Unified Research Standard v1.0-draft |
| Revisit by | 2027-03-26 |
| Revisit trigger | New evidence or source changes |