R0049/2026-03-31¶
Q001 — AI Research Prompt Frameworks
Has anyone published a complete, usable AI/LLM system prompt that implements a full analytical rigor framework for research?
Very unlikely (05-20%) that a complete published framework exists. Confidence: Medium-High.
Supported: H3 — Partial implementations exist but none achieve comprehensive coverage
H1 — Full framework exists — Partially supported
H2 — Nothing exists — Eliminated
Q002 — Unified IC-Scientific Methodology
Has anyone published a systematic combination of intelligence community analytical standards with scientific methodology frameworks?
Very unlikely (05-20%) that such a combination exists. Confidence: Medium.
Supported: H2 — No combination published
Supported: H3 — Comparison without integration
H1 — Unified methodology exists — Eliminated
Q003 — AI Research Tools with Structured Frameworks
What AI-assisted research tools implement structured analytical frameworks beyond simple chat-based research?
Very unlikely (05-20%) that a comprehensive structured tool exists undiscovered. Confidence: Medium-High.
Supported: H3 — Isolated features only, no comprehensive tool
H1 — Multiple comprehensive tools exist — Eliminated
H2 — No features at all — Eliminated
Collection Analysis¶
Cross-Cutting Patterns¶
The three queries converge on a single finding: the integration of formal analytical rigor methodology into AI research systems is a gap across every layer of the stack — from theoretical methodology (Q002) to operational prompts (Q001) to production tools (Q003).
Specific patterns observed:
-
Efficiency over rigor: The AI research ecosystem universally optimizes for speed, volume, and citation accuracy. Analytical rigor features (competing hypotheses, calibrated probability, self-audit) are absent from all major tools.
-
Parallel worlds: Intelligence community analytical standards (ICD 203, structured analytic techniques) and scientific methodology frameworks (GRADE, PRISMA, Cochrane, IPCC) have developed remarkably similar structures independently but have never been combined.
-
Partial implementations are siloed: The few partial implementations found (Roberts' LLM SATs, Framework CoT, scite Smart Citations, Open Synthesis ACH, MS Copilot Critique) each address a single technique or feature in isolation with no integration between them.
-
The methodology gap drives the tools gap: Since no unified IC-scientific methodology exists (Q002), tools have no framework to implement (Q003), and prompts have no methodology to encode (Q001).
Collection Statistics¶
| Metric | Value |
|---|---|
| Queries | 3 |
| Hypotheses generated | 9 (3 per query) |
| Hypotheses supported | 4 |
| Hypotheses partially supported | 1 |
| Hypotheses eliminated | 4 |
| Searches executed | 9 |
| Total results returned | ~150 |
| Results selected | 22 |
| Results rejected | ~128 |
| Sources evaluated | 16 |
| Evidence items | 16 |
Source Independence¶
Sources across the three queries are highly independent:
- Q001 sources: Academic publications (PRISMA-trAIce, Framework CoT), practitioner blog (Roberts), GitHub repositories (LLM Prompt Library, Agent Laboratory)
- Q002 sources: Peer-reviewed journals (Duke, IPCC, PLOS ONE), institutional report (RAND)
- Q003 sources: Open-source repositories (PaperQA2, STORM, GPT Researcher, Open Synthesis), commercial products (Elicit, scite), news coverage (MS Copilot Critique)
No source appears in more than one query. The convergent finding (gap across all layers) emerges from independent evidence streams.
Collection Gaps¶
| Gap | Queries affected | Impact |
|---|---|---|
| Classified/proprietary IC research | Q001, Q002 | Could contain unpublished framework integration |
| Enterprise tools (Palantir, Maltego) | Q003 | Proprietary features not verifiable |
| Non-English publications | Q001, Q002 | Minor — field is English-dominated |
| Dissertations/theses | Q002 | Most likely venue for novel cross-domain proposals |
| AI company internal research | Q001, Q003 | Anthropic, OpenAI, Google may have internal methodology prompts |
Collection Self-Audit¶
The consistent finding across all three queries — that comprehensive analytical rigor frameworks are absent from the AI research landscape — must be evaluated against the researcher's incentive to find this result (as the builder of such a framework). Three mitigation measures were applied:
- Searches designed to find confirming evidence: All search strategies were optimized to discover existing frameworks, not to confirm their absence.
- Generous evaluation of partial implementations: Roberts' SATs, Framework CoT, scite Smart Citations, and MS Copilot Critique were all documented as genuine prior art rather than dismissed.
- Transparent documentation of limitations: All search scope limitations (classified research, proprietary tools, non-English publications) are documented with their potential impact on confidence.
Overall collection risk: Low with some concerns — primarily about inaccessible proprietary and classified sources.
Resources¶
Summary¶
| Resource | Value |
|---|---|
| Web searches | 18 |
| Web fetches | 5 |
| Files produced | 90 |
| Duration (wall clock) | 30m 52s |
| Tool uses (total) | 137 |
Tool Breakdown¶
| Tool | Count |
|---|---|
| WebSearch | 18 |
| WebFetch | 5 (GitHub repos: PaperQA2, STORM, Open Synthesis, GPT Researcher, LLM Prompt Library) |
Token Distribution¶
| Phase | Approximate share |
|---|---|
| Search execution | 30% |
| Source evaluation | 20% |
| File generation | 50% |