R0054/2026-03-31¶
Seven claims about the design, structure, and rationale of the unified research methodology prompt were investigated. Claims ranged from external assertions (C001-C003 about the prompt's inspiration, the need for complementary instruction types, and AI sycophancy behavior) to internal structural claims (C004-C007 about the prompt's own architecture). External claims required web research; internal claims were verified primarily against the prompt itself as a primary source.
Claims¶
C001 — Choe's ICD 203 prompt — Likely (55-80%)
Claim: Joohn Choe's ICD 203 prompt is one of the first and most complete published system prompts implementing a full analytical rigor framework for AI research.
Verdict: Partially correct. The prompt is genuine, comprehensive, and publicly notable. The "most complete" aspect is well-supported for the ICD 203 niche. The "one of the first" qualifier is plausible but unverifiable.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Claim accurate as stated | Inconclusive | — |
| H2: Partially correct — complete but "first" unverifiable | Supported | 55-80% |
| H3: Materially wrong | Eliminated | — |
Confidence: Medium · Sources: 2 · Searches: 2
C002 — Descriptive guidance plus constraints — Very likely (80-95%)
Claim: Descriptive guidance alone — telling the AI what to do — is not sufficient for complex, multi-step analytical processes. Detailed positive instructions produced inconsistent results until complemented with explicit constraints on what the AI could not do.
Verdict: Well-supported. Research and practitioner evidence consistently confirm that positive instructions and negative constraints serve complementary functions, and both are needed.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Claim accurate as stated | Supported | 80-95% |
| H2: Constraints help but not necessary | Inconclusive | — |
| H3: Materially wrong | Eliminated | — |
Confidence: Medium-High · Sources: 3 · Searches: 2
C003 — AI skips workflow — Very likely (80-95%)
Claim: AI will acknowledge a research workflow, agree that it's excellent, and then quietly skip half of it when compliance conflicts with its default behavior of being helpful and agreeable.
Verdict: Well-supported by four independent research streams. Sycophancy, semantic override, and helpfulness-over-accuracy behavior are well-documented.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Claim accurate — systematic behavior | Supported | 80-95% |
| H2: Occasional, not caused by helpfulness | Inconclusive | — |
| H3: Materially wrong | Eliminated | — |
Confidence: Medium-High · Sources: 4 · Searches: 2
C004 — Twelve rules, four groups — Almost certain (95-99%)
Claim: The behavioral constraints in the prompt are organized as twelve rules in four groups: Truth Hierarchy (3), Anti-Sycophancy (3), Evidence Handling (3), Process Compliance (3).
Verdict: Exactly correct. Direct examination confirms 12 rules in 4 groups of 3 with the exact names and counts.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Claim accurate as stated | Supported | 95-99% |
| H2: Different count or groupings | Eliminated | — |
| H3: Materially wrong | Eliminated | — |
Confidence: High · Sources: 1 · Searches: 1
C005 — Axioms and tested assertions — Almost certain (95-99%)
Claim: The methodology supports both assumed-true context (axioms that are not tested) and tested assertions (claims and queries) in the same investigation.
Verdict: Exactly correct. Three input types with distinct treatment rules and explicit coexistence support.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Claim accurate as stated | Supported | 95-99% |
| H2: Partially correct | Eliminated | — |
| H3: Materially wrong | Eliminated | — |
Confidence: High · Sources: 2 · Searches: 2
C006 — Output format separation — Almost certain (95-99%)
Claim: The output format is deliberately separated from the methodology — you can change how results are presented without changing how research is conducted.
Verdict: Exactly correct. Two separate documents with distinct responsibilities. Output format self-describes as "custom" component.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Claim accurate as stated | Supported | 95-99% |
| H2: Tight coupling prevents independence | Eliminated | — |
| H3: Format embedded in methodology | Eliminated | — |
Confidence: High · Sources: 3 · Searches: 2
C007 — Researcher profile — Almost certain (95-99%)
Claim: The researcher profile documents known personal biases, professional conflicts of interest, and acknowledged blind spots, and the AI uses it to calibrate its analysis at the start and verify during self-audit.
Verdict: Exactly correct. Three-section profile with calibration at Step 1 and verification at Step 9.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Claim accurate as stated | Supported | 95-99% |
| H2: Profile exists but used at only one point | Eliminated | — |
| H3: No profile or no usage | Eliminated | — |
Confidence: High · Sources: 3 · Searches: 2
Collection Analysis¶
Cross-Cutting Patterns¶
| Pattern | Claims Affected | Significance |
|---|---|---|
| Primary source verification dominates structural claims | C004, C005, C006, C007 | Four of seven claims are about the prompt's own structure and are verifiable directly from the primary source, yielding "Almost certain" ratings |
| External claims require more nuanced assessment | C001, C002, C003 | Claims requiring external evidence yield lower but still positive probability ratings |
| Researcher COI is pervasive but manageable | All claims | The researcher is the prompt author and article series author, creating a systematic COI. For structural claims this is inconsequential; for comparative claims (C001) it requires active compensation |
| Sycophancy research provides strongest external evidence | C002, C003 | Anthropic's own sycophancy research and the semantic override paper provide the strongest external evidence base |
Collection Statistics¶
| Metric | Value |
|---|---|
| Claims investigated | 7 |
| Fully confirmed (Almost certain) | 4 (C004, C005, C006, C007) |
| Confirmed with nuance (Very likely) | 2 (C002, C003) |
| Confirmed with caveats (Likely) | 1 (C001) |
| Partially confirmed or lower | 0 |
Source Independence Assessment¶
The evidence base draws from multiple independent sources: Anthropic's primary sycophancy research (ICLR 2024), an independent academic survey on sycophancy (arXiv 2024), semantic override experiments (arXiv 2026), medical sycophancy research (PMC 2025), the Cochrane Handbook, and the prompt documents themselves. No two sources share authorship or funding. The primary limitation is that structural claims (C004-C007) rely heavily on the prompt as primary source, which is appropriate but limits the diversity of the evidence base for those claims.
Collection Gaps¶
| Gap | Impact | Mitigation |
|---|---|---|
| No controlled experiment testing multi-step workflow compliance in LLMs | Limits C003 from "Almost certain" to "Very likely" | The converging evidence from four independent research streams partially compensates |
| No comprehensive registry of published AI research prompts | Limits C001 from "Very likely" to "Likely" | Active search for competing prompts partially compensates |
| No functional test of output format swappability | C006 is structurally verified but not functionally tested | Structural evidence is sufficient for the claim as stated |
Collection Self-Audit¶
| Domain | Rating | Notes |
|---|---|---|
| Eligibility criteria | Low risk | Clear criteria defined for each claim before searching |
| Search comprehensiveness | Some concerns | External claims (C001-C003) had comprehensive web searches; structural claims relied appropriately on primary sources |
| Evaluation consistency | Low risk | Same framework applied across all seven claims |
| Synthesis fairness | Low risk | Contradictory evidence actively sought; COI consistently noted |
Resources¶
Summary¶
| Metric | Value |
|---|---|
| Claims investigated | 7 |
| Files produced | 187 |
| Sources scored | 18 |
| Evidence extracts | 18 |
| Results dispositioned | 37 selected + 76 rejected = 113 total |
Tool Breakdown¶
| Tool | Uses | Purpose |
|---|---|---|
| WebSearch | 16 | Search queries across sycophancy, prompt engineering, IC frameworks, COI |
| WebFetch | 6 | Page content retrieval for key sources |
| Write | 120 | File creation for all claim outputs |
| Read | 3 | Reading methodology and output format specs |
| Edit | 0 | No file modifications needed |
| Bash | 5 | Directory creation, batch file creation, file counting |
Token Distribution¶
| Category | Tokens |
|---|---|
| Input (context) | ~350,000 |
| Output (generation) | ~80,000 |
| Total | ~430,000 |