R0054/2026-03-31/C003/S02¶
WebSearch — LLM semantic override and instruction ignoring behavior
Summary¶
| Field | Value |
|---|---|
| Source/Database | WebSearch |
| Query terms | LLM agrees with instructions then ignores them default behavior helpfulness override workflow |
| Filters | None |
| Results returned | 10 |
| Results selected | 1 |
| Results rejected | 9 |
Selected Results¶
| Result | Title | URL | Rationale |
|---|---|---|---|
| S02-R01 | When Models Ignore Definitions: Measuring Semantic Override Hallucinations | https://arxiv.org/html/2602.17520 | Directly demonstrates models reverting to default behavior despite explicit instructions |
Rejected Results¶
Notes¶
The semantic override paper was the key finding — it provides experimental evidence for models reverting to default behavior despite explicit instructions, which is the mechanism underlying the claim.