R0028/2026-03-26/C022 — Claim Definition¶
Claim as Received¶
Published research documents performance gaps of 3 to 30 percentage points between English and non-English languages, depending on the language and task. Arabic shows the smallest gap (3 points); low-resource languages show the largest (30 points).
Claim as Clarified¶
Partially correct. Research confirms significant performance gaps between English and non-English languages in LLMs. The LILT analysis found model limitations drive 72-87% of errors. However, the specific claim that Arabic shows the smallest gap (3 points) is contradicted by evidence showing Arabic actually requires 3x more tokens than English and can collapse to much lower performance. The 3-30 point range is broadly consistent with documented gaps.
BLUF¶
Partially correct. Research confirms significant performance gaps between English and non-English languages in LLMs. The LILT analysis found model limitations drive 72-87% of errors. However, the specific claim that Arabic shows the smallest gap (3 points) is contradicted by evidence showing Arabic actually requires 3x more tokens than English and can collapse to much lower performance. The 3-30 point range is broadly consistent with documented gaps.
Scope¶
- Domain: Prompt engineering and related fields
- Timeframe: As of 2026-03-26
- Testability: Verifiable through primary sources
Assessment Summary¶
Probability: Likely (55-80%)
Confidence: Medium
Hypothesis outcome: See assessment.md.
[Full assessment in assessment.md.]
Status¶
| Field | Value |
|---|---|
| Date created | 2026-03-26 |
| Date completed | 2026-03-26 |
| Researcher profile | None provided |
| Prompt version | Unified Research Standard v1.0-draft |
| Revisit by | 2027-03-26 |
| Revisit trigger | New evidence or source changes |