Skip to content

R0049/2026-03-31/Q003-SRC01-E01

Research R0049 — Landscape Scan
Run 2026-03-31
Query Q003
Source SRC01
Evidence E01

Extract

PaperQA2 outperforms human experts on answering questions across scientific literature, produces summaries more factual than Wikipedia on average, and can detect contradictions at scale. Costs $1-$3 per query. Implements relevance-based scoring (ranked chunk retrieval + reranking + contextual summarization) but not formal bias assessment, calibrated probability, competing hypotheses, search transparency logging, or self-audit. Quality strategy relies on retrieval accuracy and citation grounding rather than analytical rigor frameworks.

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Contradicts — most advanced research agent lacks comprehensive framework Strong
H2 Supports for specific features — none of five target features present Moderate
H3 Supports — high quality without analytical rigor features demonstrates the dominant design pattern Strong

Context

PaperQA2 represents the state of the art in AI scientific research. Its performance demonstrates that high accuracy is achievable through retrieval engineering alone, which may explain why the field has not invested in analytical rigor frameworks — the accuracy-focused approach "works well enough" for many use cases.

Notes

The contradiction detection feature ("contractrow" setting) is the closest PaperQA2 comes to structured analytical methodology, but it operates at the individual claim level rather than as part of a competing hypotheses framework.