R0053/2026-03-31-02/C002
Claim: Any requirement stated to an AI without enforcement language will be treated as a suggestion — you must tell the AI what it is not allowed to do, not just what to do.
BLUF: The diagnosis is correct — AI does treat weakly-stated requirements as suggestions. But the prescription is wrong — negative constraints ("must not") are often less effective than positive reframing. Enforcement requires explicit, non-negotiable phrasing, not specifically negative framing.
Probability: Roughly even chance (45-55%) | Confidence: Medium
Summary
| Entity |
Description |
| Claim Definition |
Claim text, scope, status |
| Assessment |
Full analytical product with reasoning chain |
| ACH Matrix |
Evidence x hypotheses diagnosticity analysis |
| Self-Audit |
ROBIS-adapted 5-domain audit (process + source verification) |
Hypotheses
| ID |
Hypothesis |
Status |
| H1 |
Claim is accurate — negative constraints are necessary for enforcement |
Eliminated |
| H2 |
Claim is partially correct — enforcement needed but mechanism is wrong |
Supported |
| H3 |
Claim is materially wrong — AI reliably follows all clearly stated requirements |
Eliminated |
Searches
| ID |
Target |
Results |
Selected |
| S01 |
Negative vs positive instruction effectiveness |
10 |
2 |
| S02 |
LLM instruction hierarchy failures |
10 |
2 |
Sources
| Source |
Description |
Reliability |
Relevance |
| SRC01 |
Pink Elephant Problem analysis |
Medium |
High |
| SRC02 |
Control Illusion paper (Geng et al., 2025) |
High |
High |
| SRC03 |
Anthropic prompt engineering guidance |
High |
Medium |
Revisit Triggers
- New controlled studies on enforcement language effectiveness in LLMs
- Changes to RLHF/DPO training that affect instruction compliance
- Anthropic updates its prompt engineering guidance on positive vs negative framing