R0028/2026-03-26/C022¶
Claim: Published research documents performance gaps of 3 to 30 percentage points between English and non-English languages, depending on the language and task. Arabic shows the smallest gap (3 points); low-resource languages show the largest (30 points).
BLUF: Partially correct. Research confirms significant performance gaps between English and non-English languages in LLMs. The LILT analysis found model limitations drive 72-87% of errors. However, the specific claim that Arabic shows the smallest gap (3 points) is contradicted by evidence showing Arabic actually requires 3x more tokens than English and can collapse to much lower performance. The 3-30 point range is broadly consistent with documented gaps.
Probability: Likely (55-80%) | Confidence: Medium
Correction needed: The characterization of Arabic showing the 'smallest gap' contradicts evidence showing Arabic requires 3x more tokens than English and sometimes collapses to significantly lower accuracy.
Summary¶
| Entity | Description |
|---|---|
| Claim Definition | Claim text, scope, status |
| Assessment | Full analytical product with reasoning chain |
| ACH Matrix | Evidence x hypotheses diagnosticity analysis |
| Self-Audit | ROBIS-adapted 4-domain process audit |
Hypotheses¶
| ID | Hypothesis | Status |
|---|---|---|
| H1 | Claim is accurate including Arabic having smallest gap | Inconclusive |
| H2 | Performance gaps are real and in the documented range, but Arabic having the smallest gap is not supported | Supported |
| H3 | Claim is materially wrong | Eliminated |
Searches¶
| ID | Target | Results | Selected |
|---|---|---|---|
| S01 | Primary search | 10 | 3 |
Sources¶
| Source | Description | Reliability | Relevance |
|---|---|---|---|
| SRC01 | LILT Multilingual LLM Performance Gap Analysis | High | High |