R0053/2026-03-31-02/C002/S02/R01¶
Key paper on instruction hierarchy failures in LLMs
Summary¶
| Field | Value |
|---|---|
| Title | Control Illusion: The Failure of Instruction Hierarchies in Large Language Models |
| URL | https://arxiv.org/abs/2502.15851 |
| Date accessed | 2026-03-31 |
| Publication date | 2025-02-21 |
| Author(s) | Yilin Geng, Haonan Li, Honglin Mu, Xudong Han, Timothy Baldwin, Omri Abend, Eduard Hovy, Lea Frermann |
| Publication | arXiv (accepted to AAAI-26) |
Selection Decision¶
Included in evidence base: Yes
Rationale: Directly relevant — demonstrates that instruction hierarchies fail in LLMs, supporting the claim that requirements need enforcement but challenging the mechanism.