The Truth is Out There. Now Go Find It.¶
Translating a unified research methodology into a machine-executable prompt
The Truth is Out There Series (article 2 of 2):
-
The Truth is Out There. Now Go Find It. (this article)
In Part 1, I built a unified research methodology from nine intelligence and scientific frameworks. Twelve steps. Six bias domains. Collection-level synthesis. A self-audit. A researcher profile that functions as a calibration instrument rather than a disclosure footnote. That was the what.
This is the how.
It's one thing to define a rigorous research process for a human analyst. Humans bring professional norms, institutional memory, and career consequences for shoddy work. An intelligence analyst who ignores ICD 203 tradecraft standards faces professional repercussions. A scientist who skips PRISMA reporting requirements faces peer review rejection. The process is enforced by the culture.
AI brings none of that. It brings sycophancy, hallucination, and a tendency to tell you what you want to hear. It will acknowledge your twelve-step workflow, agree that it's excellent, and then quietly skip half of it when compliance conflicts with its default behavior of being helpful and agreeable. Give it a claim to research and it will build you a case, not conduct an investigation. Ask it to find contradictory evidence and it will find some, present it gently, and then explain why it doesn't really change the conclusion you obviously prefer.
The prompt has to do two jobs: describe the process AND constrain the behavior. This article is about both.
The Choe Insight¶
The foundation for the enforcement approach came from Joohn Choe's ICD 203 Intelligence Research Agent prompt[1]. Choe's prompt is one of the first and most complete published system prompts implementing a full analytical rigor framework for AI[2]. Other published frameworks exist -- PeerReviewPrompt, the Deep Research Prompt Framework, hybrid prompting strategies for statistical analysis -- but Choe's is distinctive for its intelligence community grounding and its approach to making an AI operate under the full weight of an analytical standard.
What makes Choe's prompt work isn't that it lists the nine ICD 203 tradecraft standards. It's the enforcement language. Consider the difference:
Without enforcement: "Use calibrated probability language when expressing uncertainty."
With enforcement: "Context and search results override your training data. Do not use your training data to supplement or correct evidence found during research. Your training data contains outdated and potentially incorrect information about recent events."
The first version describes a behavior. The second version constrains one. The first will be acknowledged and then quietly ignored the moment it conflicts with the AI's instinct to sound authoritative. The second creates a hierarchy: this source of information outranks that source of information, and here's exactly what you are prohibited from doing.
This is the principle: descriptive guidance alone — telling the AI what to do — is not sufficient for complex, multi-step analytical processes. In our experience, detailed positive instructions produced inconsistent results until we complemented them with explicit constraints on what the AI could not do.
A note of intellectual honesty: this contradicts Anthropic's own general guidance, which advises "tell Claude what to do instead of what not to do." Research on the "Pink Elephant Problem" suggests negative instructions can backfire — the AI fixates on the prohibited concept rather than avoiding it. For general task completion, where the AI's default behavior is already pointed in the right direction, positive framing likely is more effective.
But this is not general task completion. This is asking the AI to suppress its default behaviors — sycophancy, premature termination, confirmation bias — over an extended multi-step process. The AI's defaults are actively misaligned with the goal. Telling it "be thorough" doesn't work because it thinks it IS being thorough when it stops at the first confirming evidence. You have to say "do NOT stop when you find confirming evidence" because the failure mode is invisible to the AI from the inside.
The distinction, as best I can articulate it: positive framing works when the AI's defaults are aligned with your goal. Explicit prohibitions are necessary when you need to override defaults that actively work against it. Every rule in my prompt uses both: a positive description of the desired behavior, reinforced by an explicit prohibition of the specific failure mode. I don't say "be transparent about uncertainty." I say "when you are uncertain about your own analysis, say so explicitly. Use phrases like 'my confidence in this assessment is limited because...' Do not present uncertain analysis as settled conclusion." The behavior is defined, the format is specified, and the failure mode is explicitly prohibited.
Three-Layer Architecture¶
The prompt is structured in three layers, each serving a different function. This isn't decorative -- the layering is load-bearing. Layer 1 constrains behavior regardless of what research is being done. Layer 2 defines the analytical workflow. Layer 3 specifies what the output looks like. A constraint in Layer 1 can override a step in Layer 2. Nothing in Layer 3 can override anything above it.
Layer 1: Behavioral Constraints¶
Layer 1 contains twelve rules organized into four groups. These are universal and topic-agnostic. They apply whether you're verifying a claim, answering a question, or operating within declared axioms.
Truth Hierarchy (rules 1-3) establishes what the AI is allowed to treat as true. Evidence from research outranks training data. The researcher's claims and queries are inputs to be tested, not truths to be confirmed. The exception is declared axioms -- facts the researcher explicitly marks as ground truth that the process accepts without testing. This accommodates the intelligence use case where classified context cannot be verified through open sources, while still testing everything else. When no evidence exists, the AI must say so rather than generate plausible-sounding filler. These three rules come from ICD 203's tradecraft standards, adapted from a context where the "training data" equivalent is an analyst's background knowledge and institutional assumptions.
Anti-Sycophancy Rules (rules 4-6) directly target AI's most dangerous default behavior. Rule 4: if evidence contradicts the researcher's preferred hypothesis, highlight the contradiction prominently -- do not minimize, hedge, or bury it. Rule 5: surface and test embedded assumptions -- but not declared axioms, which are explicit by definition. This includes detecting embedded claims in questions (e.g., "Why did X fail?" assumes X failed). Rule 6: when uncertain, say so with explicit language. These derive from Chamberlin and Platt's falsification principle and ICD 203's alternatives standard. In human analysis, the equivalent norms are maintained by professional training and peer review. In AI, they have to be stated as prohibitions.
Evidence Handling Rules (rules 7-9) govern how evidence is categorized and used. Rule 7 requires explicit markers -- FACT, REPORTED, or JUDGMENT -- for every assertion, drawn from ICD 203's distinction standard. Rule 8 bans the use of training data for anything time-sensitive, because the AI's internal knowledge about recent events is unreliable by definition. Rule 9 requires equal rigor for evidence that supports and contradicts the researcher's hypothesis, with an explicit warning: "If you find yourself building a case rather than conducting an investigation, stop and reassess."
Process Compliance Rules (rules 10-12) prevent the AI from cutting corners. Follow every step. Log every search. Don't stop early. These come from PRISMA's transparency requirements and ROBIS's self-audit philosophy. Rule 12 is particularly important: "If early results appear to conclusively support or refute a claim, continue the full workflow anyway. Premature termination is a bias vector." This targets the AI's tendency to declare victory as soon as it finds evidence that seems to answer the question -- exactly the behavior Chamberlin warned about in 1890.
Layer 2: Analytical Methodology¶
Layer 2 is the workflow from Part 1, translated into machine-executable instructions. Each step includes its source framework attribution, so anyone reading the prompt can trace every requirement back to its origin. The prompt handles both claim verification and query answering -- the workflow is the same, with conditional behavior at a few steps where the modes diverge.
The steps, briefly:
- Receive and Clarify -- acknowledge axioms (if any), restate claims or queries, surface embedded assumptions, map the vocabulary space across domains, decompose compound inputs, and perform the researcher profile check (more on this below).
- Generate Competing Hypotheses -- minimum three. For claims: correct, incorrect, partially correct. For queries with enumerable answers: affirmative, negative, nuanced. For open-ended queries: hypotheses may be omitted and the answer synthesized from evidence directly.
- Design Discriminating Searches -- with hypotheses: design searches to disprove each one. Without hypotheses (open-ended queries): design for comprehensive coverage and diversity of perspective.
- Execute Searches and Log Methodology -- run the searches and document everything: terms, sources, results, rejections, absences.
- Score Each Source -- reliability, relevance, and six bias domains per source.
- Synthesize the Collection -- assess the body of evidence as a whole: quality, agreement, independence, outliers. For open-ended queries, also identify thematic clusters and convergence patterns.
- Assess -- for claims and enumerable queries: apply the ICD 203 seven-point calibrated probability scale. For open-ended queries: derive the answer with confidence rating and reasoning chain.
- Identify Gaps -- document what you expected to find but didn't, and what that means.
- Self-Audit + Source-Back Verification -- audit the process across four ROBIS domains, then re-read each cited source and verify that the assessment accurately represents what the sources say. Five domains total.
- Report with Revisit Triggers -- produce the structured report with explicit reasoning chains, plus specific testable conditions that would warrant re-running the research.
- Archive for Temporal Revisitation -- package everything for re-execution later.
Two features are net-new -- not present in any of the nine source frameworks. Vocabulary exploration (Step 1) addresses a systematic blind spot I discovered in practice: different domains use different terms for the same phenomenon, and single-term searches miss entire bodies of relevant work. Source-back verification (Step 9) addresses an interpretation gap: the process self-audit checks whether you followed the steps, but not whether your conclusions actually match what your sources say. Both were added after real failures exposed the gaps.
Layer 3: Output Structure¶
Layer 3 defines what the research produces. The output format is deliberately separated from the methodology -- you can change how results are presented without changing how research is conducted. The prompt ships with a default output format that produces clean, portable markdown. Custom formats can be substituted for integration with specific platforms (MkDocs, GitHub Pages, or any other rendering system).
Regardless of format, the output must include three deliverables:
The Research Report -- the full analytical product for each claim or query: axioms (if any), the input as received and clarified, competing hypotheses, assessment with probability or confidence, evidence summary, collection synthesis, gaps, five-domain self-audit, researcher bias check, revisit triggers, and search methodology log. Every claim in the report must be sourced. Every judgment must be distinguished from fact. Every reasoning chain must be explicit enough that a reader can follow the logic from evidence through synthesis to conclusion.
The Search Methodology Log -- a separate artifact documenting every search performed, detailed enough that another researcher could replicate the search and verify the results. This is a mandatory deliverable, not optional metadata.
The Source Scorecards -- individual source assessments using the scoring format from Step 5, including reliability, relevance, and six-domain bias assessment.
The Researcher Profile¶
Part 1 described the researcher profile as a functional input -- a calibration instrument, not a disclosure footnote. Here's how it actually works in the prompt.
The profile is a separate document provided alongside the prompt. It contains three sections:
Declared Biases: What the researcher tends to believe or assume, and in what direction it might influence research. Not personality traits. Directional analytical tendencies that could skew which claims get investigated and how evidence gets interpreted.
Conflicts of Interest: Professional roles, financial interests, organizational affiliations, and tool dependencies that could influence research. The researcher who works for a cloud provider has a conflict of interest when researching cloud reliability claims. The researcher who advocates for AI adoption has a conflict when evaluating AI effectiveness evidence.
Acknowledged Blind Spots: Areas where the researcher's knowledge or perspective is limited, and what kinds of evidence might be overlooked as a result. A systems engineer evaluating social science research. A generalist assessing domain-specific claims. These aren't weaknesses to hide -- they're gaps that the process needs to compensate for.
The prompt uses this profile at two critical moments. First, at Step 1, before research begins: the AI reviews the researcher profile against the specific claim being researched. If any declared bias, conflict of interest, or blind spot is relevant, the AI stops and tells the researcher explicitly. It states which profile element is relevant, how it might influence the framing of the question or interpretation of results, and how it intends to compensate during research. The researcher then has the opportunity to reframe the question, add axioms, or adjust the scope before research proceeds.
This is not a silent calibration. It is a transparent confrontation. The researcher declared these biases so the process could account for them. They should see the process accounting for them in real time.
Second, at Step 10, the self-audit: the AI checks whether any of the researcher's declared biases or conflicts of interest influenced the questions asked, the searches designed, or the interpretation of results. This is the bookend -- the initial confrontation sets the awareness, and the self-audit verifies that the awareness actually held throughout the research process.
Getting the Tool¶
The prompt and methodology are maintained as a versioned, open-source project. Rather than publishing the full prompt inline here -- where it would go stale the moment I improve it -- I'm pointing you to the living artifact.
A note on attribution: this prompt was developed independently. The enforcement language approach was inspired by Joohn Choe's ICD 203 Intelligence Research Agent prompt[1]. The analytical methodology is derived from the nine intelligence and scientific frameworks documented in Part 1. The prompt itself is substantially different from Choe's in scope, structure, and content -- but without Choe's work demonstrating that enforcement language is the key to making AI comply with analytical standards, I would not have arrived at this architecture.
Option 1: Claude Code Plugin (recommended)¶
If you use Claude Code, install it as a plugin:
/plugin marketplace add wphillipmoore/ai-research-methodology
/plugin install ai-research-methodology@ai-research-methodology
This gives you the /research skill with full orchestration -- input
gathering, confirmation, subagent launch, completion reporting, and rerun
support. The plugin ships with a default output format that produces clean,
portable markdown. If you need custom output formatting for a specific
platform, you can configure an override at install time.
Option 2: Standalone Prompt (any AI interface)¶
If you're using Claude on the web, ChatGPT, or any other interactive AI interface, grab the standalone prompt:
https://github.com/wphillipmoore/ai-research-methodology/blob/main/standalone/research.md
This is a single file containing both the research methodology and the default output format. Copy it, paste it into a conversation, and provide your research input. I developed and tested this exclusively with Claude, but the prompt uses no Claude-specific features -- it should work with any capable AI model. The prompt adapts to the environment -- if the AI can write files, it produces a directory of linked markdown. If it can't, it produces a single self-contained HTML file you can download.
How to Use It¶
The simplest way to start:
The AI will ask you for your input interactively. You'll be prompted for:
- Axioms (optional) -- facts to assume true during the research
- Claims -- assertions to verify against evidence
- Queries -- questions to answer with evidence
- Output directory -- where to write the results
Or, for batch mode, prepare a markdown input file:
## Axioms
1. The system operates under a 20ms latency constraint.
## Claims
1. NFS v4 is unsuitable for wide-area network deployments.
## Queries
1. What alternatives to NFS provide acceptable performance over
high-latency links?
Then run it:
The research agent reads the methodology prompt, reads the output format specification, processes each claim and query through the full workflow -- vocabulary exploration, competing hypotheses, discriminating searches, source scoring, collection synthesis, gap identification, self-audit with source-back verification -- and writes the complete evidence archive to the output directory. When it finishes, you get a summary with verdict/answer for each item and flags for anything that needs attention.
To re-run the same research later (for temporal revisitation or to test reproducibility):
This reads the saved input specification and creates a new date-stamped run alongside the original. The rerun agent has no access to prior results -- isolation is enforced to prevent anchoring bias.
What You Get¶
The output is a complete evidence archive. For each claim or query:
- An assessment with probability rating (or confidence + answer for queries), explicit reasoning chain, and evidence summary
- Source scorecards with reliability, relevance, and six-domain bias assessment for every source
- Search logs documenting every search performed -- terms, results, inclusions, rejections, and absences
- A five-domain self-audit including source-back verification
- Revisit triggers -- specific conditions that would warrant re-running the research
Plus a collection-level analysis identifying cross-cutting patterns, source independence assessment, and collection gaps.
The point isn't just to get an answer. It's to get an answer you can audit -- where every conclusion traces back through evidence to sources, and every judgment is distinguishable from fact. When someone challenges a claim, you don't argue from memory. You point them to the evidence archive and say: here's how I got there. Check my work.
You're looking at a live example right now. The Research section at the bottom of this article and its companion (Part 1) link directly to the full evidence archives that back every claim in both articles. Click any R-number to see the complete investigation — every search logged, every source scored, every hypothesis tested. In the References section, each source is prefixed with its research run and source ID — click the SRC link to see the complete breakdown of how that individual source was evaluated: reliability, relevance, six-domain bias assessment, and the specific evidence extracted from it. That's what this methodology produces. The output format shown there uses a custom configuration for integration with this site's documentation system; the default output format produces the same content in clean, portable markdown.
A Note on What's Not Yet Tested¶
The methodology includes a researcher profile -- a document declaring the human researcher's known biases, conflicts of interest, and blind spots, which the AI uses to calibrate its analysis. This feature is documented and implemented in the prompt but has not yet been exercised in production research. I'm disclosing this in the interest of the same transparency the methodology demands. It's next on the list.
What's Next¶
The prompt is a tool. A good one, I think, but a tool. The methodology described in Part 1 is the thinking behind it. The prompt encodes that thinking in a form that an AI can execute, but the real value is in the framework evaluation that produced it -- the decisions about what to keep, what to skip, and why.
The prompt is maintained as an open-source project with transparent version history. When I find something that doesn't work or discover a better approach, I update it. The Research and References sections at the bottom of these articles link directly to the full evidence archives — every source scored, every search logged, every hypothesis tested. That evidence site is live now.
What's actually next:
- A researcher profile questionnaire that helps researchers surface their own biases, conflicts, and blind spots in a structured format.
- Sub-agent definitions for specialized research roles — source scoring, collection synthesis — that can operate under the prompt's constraints.
- A claim extraction command that reads a document and produces a list of verifiable claims for input to the research tool.
I'm also making a public commitment to corrections. Every article published under this methodology includes its research process and evidence base as public artifacts. If a claim is later found to be wrong, the correction and the reasoning behind it will be published with the same transparency as the original research. Getting things right matters more than appearing to have always been right.
Closing¶
Part 1 asked the question: how do you find the truth? The answer was a unified research methodology built from the best ideas of nine frameworks, each of which has been refined by decades -- in some cases centuries -- of institutional practice. The methodology is framework-agnostic -- you could remove every reference to AI and it would stand on its own as a research methodology for human analysts. You could hand this to a human research team. You'd just need a bigger budget and a more generous PTO policy.
This article translates that methodology into a prompt. The prompt is the implementation, not the idea. It will change as I learn what works and what doesn't. The methodology will change too, but more slowly, because the frameworks it draws from have already survived extensive real-world testing.
If you use this prompt, you'll find that it produces research that is harder to argue with -- not because it's always right, but because the process is transparent, the reasoning is explicit, and the evidence is documented. You can disagree with the conclusions. You can challenge the sources. You can audit the search methodology. That's the point.
The truth is out there. Now go find it.
Research¶
| ID | Topic | Queries/Claims |
|---|---|---|
| R0053 | Article claim verification | 7 claims |
| R0049 | Published AI research methodology prompts | 3 queries |
References¶
Each reference is prefixed with links to the evidence behind it: the first link goes to the claim verification that tested it; the second goes to the source's scorecard — reliability, relevance, bias assessment, and extracted evidence.
[1] (R0053/C001, SRC01) Choe, Joohn. "The Copy and Paste War: On AI for Citizen OSINT." Substack, 2024. https://joohn.substack.com/p/the-copy-and-paste-war-on-ai-for
[2] (R0049) Landscape scan of published AI prompts implementing research rigor frameworks. No complete framework found.
[3] (R0052/C001, SRC01) Office of the Director of National Intelligence. "Intelligence Community Directive 203: Analytic Standards," 2015. https://www.dni.gov/files/documents/ICD/ICD-203-Analytic-Standards.pdf
[4] (R0052/C007, SRC01) Chamberlin, T.C. "The Method of Multiple Working Hypotheses." Science, 1890. Revised 1897; reprinted 1965. https://doi.org/10.1126/science.148.3671.754
[5] (R0052/C008, SRC01) Platt, John R. "Strong Inference." Science, 146(3642):347-353, 1964. https://doi.org/10.1126/science.146.3642.347
[6] (R0052/C003, SRC01) GRADE Working Group. Schunemann H, Brozek J, Guyatt G, Oxman A, eds. "GRADE Handbook," 2013. https://gdt.gradepro.org/app/handbook/handbook.html
[7] (R0052/C004, SRC01) IPCC. "Guidance Note for Lead Authors on Consistent Treatment of Uncertainties." 2010. https://www.ipcc.ch/site/assets/uploads/2017/08/AR5_Uncertainty_Guidance_Note.pdf
[8] (R0052/C005, SRC01) Page, M.J. et al. "The PRISMA 2020 statement." BMJ, 2021;372:n71. https://doi.org/10.1136/bmj.n71
[9] Sterne, J.A.C. et al. "RoB 2: a revised tool for assessing risk of bias in randomised trials." BMJ, 2019;366:l4898. https://doi.org/10.1136/bmj.l4898
[10] (R0052/C014, SRC01) Whiting, P. et al. "ROBIS: A new tool to assess risk of bias in systematic reviews." J Clin Epidemiol, 2016;69:225-234. https://doi.org/10.1016/j.jclinepi.2015.06.005
[11] (R0052/C010, SRC01) National Academies of Sciences, Engineering, and Medicine. "Finding What Works in Health Care: Standards for Systematic Reviews." 2011. https://doi.org/10.17226/13059
[12] Choe's ICD 203 prompt: no explicit license found as of 2026-03-12. Standard copyright applies. My prompt was developed independently; the enforcement language approach was inspired by Choe's work.