C002 — Assessment¶


Research	R0053 — Prompt Claims
Run	2026-03-31-02
Claim	C002

BLUF¶

The claim correctly identifies that AI systems treat loosely-stated requirements as optional — this is well-documented in instruction-following research. However, the specific remedy ("tell the AI what it is not allowed to do") contradicts empirical evidence showing negative instructions are often less effective than positive reframing. The truth lies in enforcement language broadly (explicit, non-negotiable phrasing) rather than specifically in negative constraints.

Probability¶

Rating: Roughly even chance (45-55%)

Confidence in assessment: Medium

Confidence rationale: Strong evidence exists on both sides. The problem diagnosis (requirements treated as suggestions) is well-supported. The prescribed solution (negative framing) is contradicted by research. The claim's truth depends on whether you evaluate the diagnosis or the prescription.

Reasoning Chain¶

The "Pink Elephant Problem" analysis demonstrates that LLMs produce worse output with more "DO NOTs" in prompts, and that negative instructions activate the very concepts they aim to prohibit (Ironic Process Theory). [SRC01-E01, Medium reliability, High relevance]
Anthropic explicitly advises: "Tell Claude what to do instead of what not to do." Real-world examples show positive reframing resolves issues that negative constraints fail to address. [SRC01-E02, Medium reliability, High relevance]
The "Control Illusion" paper (Geng et al., 2025) demonstrates that system/user prompt separation fails to establish reliable instruction hierarchies, and models exhibit inherent biases toward certain constraint types regardless of priority designation. [SRC02-E01, High reliability, High relevance]
However, negative constraints retain value for establishing firm boundaries and preventing harmful behavior — suggesting a role for "must not" language in specific contexts. [SRC01-E01, Medium reliability, Medium relevance]
JUDGMENT: The claim's diagnosis is correct (requirements need enforcement). The prescription (negative framing) is incomplete and partially counterproductive. The most effective approach combines explicit positive instructions with strategic negative boundaries.

Evidence Base Summary¶

Source	Description	Reliability	Relevance	Key Finding
SRC01	Pink Elephant Problem analysis	Medium	High	Negative instructions less effective than positive
SRC02	Control Illusion paper (Geng et al.)	High	High	Instruction hierarchies fail in LLMs
SRC03	Anthropic prompt engineering guidance	High	Medium	Official guidance favors positive over negative framing

Collection Synthesis¶

Dimension	Assessment
Evidence quality	Medium — Mix of academic research and practitioner analysis
Source agreement	High — Sources consistently find negative instructions suboptimal
Source independence	Medium — The Pink Elephant analysis cites Anthropic guidance; Control Illusion is independent
Outliers	None

Detail¶

The evidence converges on a nuanced position: enforcement language is necessary (supporting the claim's diagnosis), but negative framing specifically is often counterproductive (contradicting the claim's prescription). The most effective approach combines clear positive instructions with strategic use of negative constraints for firm boundaries only.

Gaps¶

Missing Evidence	Impact on Assessment
Controlled studies comparing positive vs negative enforcement in identical prompts	High — would directly test the claim's mechanism
Research on "enforcement language" as a category beyond positive/negative	Medium — the claim uses this term without defining it

Researcher Bias Check¶

Declared biases: No researcher profile provided.

Influence assessment: The claim originates from the methodology's own design philosophy. The methodology itself uses enforcement language extensively ("You are NOT ALLOWED to..."), creating a potential confirmation bias toward believing enforcement language works.

Cross-References¶

Entity	ID	File
Hypotheses	H1, H2, H3	`hypotheses/`
Sources	SRC01, SRC02, SRC03	`sources/`
ACH Matrix	—	ach-matrix.md
Self-Audit	—	self-audit.md