R0027/2026-03-26/Q002/H3¶


Research	R0027 — Multilingual prompt engineering challenges
Run	2026-03-26
Query	Q002
Hypothesis	H3

Statement¶

Challenges from linguistic structural differences exist but are primarily mediated through tokenization efficiency and training data representation rather than linguistic structure per se. The causal chain is: linguistic structure → tokenization/training data → performance impact.

Status¶

Current: Supported

H3 is the best-supported hypothesis. The evidence establishes a clear causal chain: languages with complex morphology, non-Latin scripts, or agglutinative structures require more tokens per word, which directly reduces accuracy (8-18pp per token/word) and increases cost (quadratic scaling). Meanwhile, direct linguistic nuances account for only ~2% of failures. The challenge is real and traceable to linguistic structure, but the mechanism is computational.

Supporting Evidence¶

Evidence	Summary
SRC03-E01	Tokenization fertility predicts accuracy; morphological complexity → more tokens → lower accuracy
SRC04-E01	72-87% of failures from model limitations (tokenizer, latent space); ~2% from language nuances
SRC05-E01	Agglutinative languages break tokenization; workaround is translation to English
SRC02-E01	Arabic complexity affects even Arabic-centric models — the challenge is in processing, not understanding
SRC01-E02	Mandarin's tonal challenge is largely irrelevant for text — the textual challenge is about characters/tokenization

Contradicting Evidence¶

No evidence directly contradicts H3. All evidence is consistent with the mediated-mechanism model.

Reasoning¶

The evidence converges on a clear picture: linguistic structural features (SOV, agglutination, morphological richness, non-Latin scripts) are real and create real challenges, but the mechanism is not that models "cannot understand" these structures. Rather, the mechanism is that current tokenization systems impose a disproportionate computational cost on these languages, which reduces both accuracy and economic viability. This distinction matters for solutions — the fix is better tokenizers and more representative training data, not fundamentally different model architectures.

Relationship to Other Hypotheses¶

H3 integrates H1 (challenges exist) and H2 (computation dominates) into a coherent causal model. It agrees with H1 that linguistic structure matters and with H2 that computation is the proximate cause, while adding the causal link between them.