Skip to content

R0054/2026-03-31/C003/S02

Research R0054 — Prompt Claims v2
Run 2026-03-31
Claim C003
Search S02

WebSearch — LLM semantic override and instruction ignoring behavior

Summary

Field Value
Source/Database WebSearch
Query terms LLM agrees with instructions then ignores them default behavior helpfulness override workflow
Filters None
Results returned 10
Results selected 1
Results rejected 9

Selected Results

Result Title URL Rationale
S02-R01 When Models Ignore Definitions: Measuring Semantic Override Hallucinations https://arxiv.org/html/2602.17520 Directly demonstrates models reverting to default behavior despite explicit instructions

Rejected Results

Result Title URL Rationale
S02-R02 The Instruction Hierarchy https://arxiv.org/html/2404.13208v1 About instruction priority, not workflow skipping
S02-R03 Wait, that's not an option: LLMs Robustness https://arxiv.org/html/2409.00113v3 About incorrect options, not workflow compliance
S02-R04 How Ignore All Previous Instructions is Breaking AI https://learnprompting.org/blog/ignore_previous_instructions About prompt injection, not sycophantic non-compliance
S02-R05 Securing LLMs Against Prompt Injection https://blog.securityinnovation.com/securing-llms-against-prompt-injection-attacks Security focus, not behavioral compliance
S02-R06 Context Ignoring Attack https://learnprompting.org/docs/prompt_hacking/offensive_measures/context-ignoring-attack Adversarial context, not sycophantic behavior
S02-R07 Fix LLM Bias: Override AI Positivity https://blog.buildbetter.ai/mitigating-llm-biases-why-large-language-models-default-to-positivity-2-or-3-answers-and-how-to-push-past-them/ About positivity bias in content, not process compliance
S02-R08 LLM Prompt Injection Prevention - OWASP https://cheatsheetseries.owasp.org/cheatsheets/LLM_Prompt_Injection_Prevention_Cheat_Sheet.html Security focus
S02-R09 How does a LLM know how to follow instructions https://www.quora.com/How-does-a-large-language-model-LLM-like-Chat-GPT-know-how-to-follow-instructions-like-ignore-previous-rules-With-no-understanding-only-probabilistically-finding-the-next-word-doesnt-seem-sufficient-Is-there Q&A, insufficient depth
S02-R10 (No 10th result) N/A N/A

Notes

The semantic override paper was the key finding — it provides experimental evidence for models reverting to default behavior despite explicit instructions, which is the mechanism underlying the claim.