Skip to content

R0053/2026-03-31-02/C002

Research R0053 — Prompt Claims
Run 2026-03-31-02
Claim C002

Claim: Any requirement stated to an AI without enforcement language will be treated as a suggestion — you must tell the AI what it is not allowed to do, not just what to do.

BLUF: The diagnosis is correct — AI does treat weakly-stated requirements as suggestions. But the prescription is wrong — negative constraints ("must not") are often less effective than positive reframing. Enforcement requires explicit, non-negotiable phrasing, not specifically negative framing.

Probability: Roughly even chance (45-55%) | Confidence: Medium


Summary

Entity Description
Claim Definition Claim text, scope, status
Assessment Full analytical product with reasoning chain
ACH Matrix Evidence x hypotheses diagnosticity analysis
Self-Audit ROBIS-adapted 5-domain audit (process + source verification)

Hypotheses

ID Hypothesis Status
H1 Claim is accurate — negative constraints are necessary for enforcement Eliminated
H2 Claim is partially correct — enforcement needed but mechanism is wrong Supported
H3 Claim is materially wrong — AI reliably follows all clearly stated requirements Eliminated

Searches

ID Target Results Selected
S01 Negative vs positive instruction effectiveness 10 2
S02 LLM instruction hierarchy failures 10 2

Sources

Source Description Reliability Relevance
SRC01 Pink Elephant Problem analysis Medium High
SRC02 Control Illusion paper (Geng et al., 2025) High High
SRC03 Anthropic prompt engineering guidance High Medium

Revisit Triggers

  • New controlled studies on enforcement language effectiveness in LLMs
  • Changes to RLHF/DPO training that affect instruction compliance
  • Anthropic updates its prompt engineering guidance on positive vs negative framing