R0049/2026-03-31-02¶
Queries¶
Q001 — AI Research Prompt Frameworks
Has anyone published a complete, usable AI/LLM system prompt implementing a full analytical rigor framework for research?
Very unlikely (05-20%). No published system prompt implements a complete analytical rigor framework.
| Hypothesis | Status |
|---|---|
| H1 — Complete prompts exist | Not supported |
| H2 — Only narrow-task prompts exist | Partially supported |
| H3 — Partial implementations exist | Supported |
Q002 — Unified IC + Scientific Methodology
Has anyone published a systematic combination of IC analytical standards with scientific methodology frameworks into a unified methodology?
Very unlikely (05-20%). Multiple scholars have called for bridging but none has produced a unified framework.
| Hypothesis | Status |
|---|---|
| H1 — Unified frameworks exist | Not supported |
| H2 — Domains entirely siloed | Partially supported |
| H3 — Partial bridges exist | Supported |
Q003 — AI Research Tools with Structured Frameworks
What AI research tools implement structured analytical frameworks beyond simple chat?
Confidence: High. Rich tool ecosystem exists but none implements the five queried analytical rigor dimensions.
| Hypothesis | Status |
|---|---|
| H1 — Comprehensive framework tools exist | Not supported |
| H2 — No structured features in any tool | Not supported |
| H3 — Partial structured features exist | Supported |
Collection Analysis¶
Cross-Cutting Patterns¶
-
The "partial but not comprehensive" pattern: All three queries converge on H3 (partial implementations exist). The field has produced narrow-task tools (Q001), parallel-but-separate traditions (Q002), and citation-focused platforms (Q003), but no comprehensive integration in any dimension.
-
The prompt-vs-code divide: Comprehensive research systems exist (Agent Laboratory, AI-Researcher) but are implemented in code rather than as system prompts (Q001). This suggests that the complexity of full analytical rigor exceeds what current prompt architecture can effectively encode.
-
The citation-transparency ceiling: All major AI research tools implement some form of citation transparency (Q003), but none goes beyond citing sources to implement analytical rigor — probability calibration, bias assessment, competing hypotheses, or self-audit. Citation is the floor, not the ceiling, of analytical rigor.
-
The parallel traditions gap: IC analytical standards and scientific methodology frameworks address the same fundamental challenges (evidence quality, uncertainty, bias) through independently developed solutions (Q002). Neither community has adopted the other's specific tools, creating an integration opportunity that remains unfilled.
Statistics¶
| Metric | Value |
|---|---|
| Queries investigated | 3 |
| Hypotheses tested | 9 |
| Hypotheses supported | 3 (all H3 variants) |
| Searches executed | 10 |
| Results dispositioned | 26 selected + 74 rejected = 100 total |
| Sources scored | 20 |
| Evidence extracts | 20 |
Source Independence¶
Sources across the three queries are largely independent:
- Q001 sources: Academic surveys (Prompt Report), leaked prompts (Perplexity, OpenAI), open-source implementations (sroberts), conference papers (Agent Laboratory, AI-Researcher)
- Q002 sources: IC methodology (CIA Primer, Heuer & Pherson), academic bridging (Treverton, Prunckun, Tecuci, Marcoci)
- Q003 sources: Commercial platforms (Elicit, Scite, Perplexity), open-source tools (STORM, GPT-Researcher, Khoj), academic evaluation (JMIR)
Cross-query overlap: Perplexity appears in both Q001 (system prompt analysis) and Q003 (tool evaluation), providing consistent evidence from different angles.
Collection Gaps¶
- Classified IC literature: The intelligence community may have internal frameworks or AI implementations not accessible through open search.
- Custom GPT ecosystem: Thousands of custom GPTs with private prompts may implement analytical frameworks not publicly documented.
- Enterprise tools: Palantir AIP, IBM Watson, and similar enterprise platforms may implement analytical rigor features not covered in public reviews.
- Non-English sources: Research was limited to English-language sources.
- Rapid evolution: The AI tool landscape changes faster than any point-in-time assessment can capture.
Collection Self-Audit¶
| Criterion | Assessment |
|---|---|
| Search comprehensiveness | Pass — 10 searches across 3 queries covering academic, commercial, open-source, and leaked sources |
| Evidence quality | Pass — 20 sources scored, mix of peer-reviewed and primary artifacts |
| Hypothesis testing | Pass — 9 hypotheses with supporting and contradicting evidence evaluated |
| Bias management | Pass — Researcher bias checks performed for each query |
| Convergence | Strong — All three queries converge independently on the same meta-finding (partial implementations only) |
| Independence | Pass — Sources across queries are largely independent |
Resources¶
Summary¶
| Metric | Value |
|---|---|
| Queries investigated | 3 |
| Files produced | 101 |
| Sources scored | 20 |
| Evidence extracts | 20 |
| Results dispositioned | 26 selected + 74 rejected = 100 total |
Tool Breakdown¶
| Tool | Uses | Purpose |
|---|---|---|
| WebSearch | 22 | Search queries |
| WebFetch | 5 | Page content retrieval |
| Write | 101 | File creation |
| Bash | 5 | Directory creation and management |
Token Distribution¶
| Category | Tokens |
|---|---|
| Input (context) | ~150,000 |
| Output (generation) | ~80,000 |
| Total | ~230,000 |