R0049/2026-03-31/Q003-H3¶
Statement¶
AI-assisted research tools implement individual structured analytical features in isolation, but no tool achieves comprehensive coverage of the five target features. The dominant design pattern prioritizes research efficiency (speed, volume, coverage) over analytical rigor (confidence calibration, bias assessment, self-audit).
Status¶
Supported. This is the best-supported hypothesis. The evidence reveals a clear landscape pattern: tools optimize for different value propositions that do not include analytical rigor as a primary design goal.
Supporting Evidence¶
| Evidence | Summary |
|---|---|
| SRC01-E01 | PaperQA2 — most advanced academic agent, optimizes for citation accuracy, lacks analytical rigor features |
| SRC02-E01 | STORM — optimizes for knowledge breadth and multi-perspective coverage |
| SRC03-E01 | Elicit — optimizes for screening efficiency, approaching human-level accuracy |
| SRC04-E01 | Scite — implements citation context (one feature) but not comprehensive framework |
| SRC05-E01 | MS Copilot Critique — cross-model audit (one feature), not full analytical rigor |
| SRC06-E01 | Open Synthesis — implements ACH (one feature), maintenance mode, no AI |
| SRC07-E01 | GPT Researcher — volume-based quality ("most frequent information"), no formal rigor |
Contradicting Evidence¶
| Evidence | Summary |
|---|---|
| — | No contradicting evidence found |
Reasoning¶
The landscape can be categorized by primary design goal:
- Citation accuracy: PaperQA2, scite
- Knowledge breadth: STORM, Perplexity Deep Research
- Screening efficiency: Elicit, ASReview
- Information aggregation: GPT Researcher, OpenAI Deep Research
- Cross-model verification: Microsoft Copilot Critique/Council
None of these categories prioritizes the analytical rigor features described in Q003. The closest approaches are scite's Smart Citations (partial evidence quality) and Microsoft's Critique (partial audit). Both are single-feature implementations.
Relationship to Other Hypotheses¶
Subsumes H1 (eliminated) and H2 (eliminated). Consistent with the findings from Q001 (no comprehensive prompts) and Q002 (no unified methodology) — the gap in the tools landscape mirrors the gaps in the prompts and methodology landscapes.