C022 — Claim Definition¶


Research	R0028 — Prompt Engineering Claims
Run	2026-03-26
Claim	C022

Claim as Received¶

Published research documents performance gaps of 3 to 30 percentage points between English and non-English languages, depending on the language and task. Arabic shows the smallest gap (3 points); low-resource languages show the largest (30 points).

Claim as Clarified¶

Partially correct. Research confirms significant performance gaps between English and non-English languages in LLMs. The LILT analysis found model limitations drive 72-87% of errors. However, the specific claim that Arabic shows the smallest gap (3 points) is contradicted by evidence showing Arabic actually requires 3x more tokens than English and can collapse to much lower performance. The 3-30 point range is broadly consistent with documented gaps.

BLUF¶

Partially correct. Research confirms significant performance gaps between English and non-English languages in LLMs. The LILT analysis found model limitations drive 72-87% of errors. However, the specific claim that Arabic shows the smallest gap (3 points) is contradicted by evidence showing Arabic actually requires 3x more tokens than English and can collapse to much lower performance. The 3-30 point range is broadly consistent with documented gaps.

Scope¶

Domain: Prompt engineering and related fields
Timeframe: As of 2026-03-26
Testability: Verifiable through primary sources

Assessment Summary¶

Probability: Likely (55-80%)

Confidence: Medium

Hypothesis outcome: See assessment.md.

[Full assessment in assessment.md.]

Status¶

Field	Value
Date created	2026-03-26
Date completed	2026-03-26
Researcher profile	None provided
Prompt version	Unified Research Standard v1.0-draft
Revisit by	2027-03-26
Revisit trigger	New evidence or source changes