R0049/2026-03-31/Q003-SRC01-E01¶
Extract¶
PaperQA2 outperforms human experts on answering questions across scientific literature, produces summaries more factual than Wikipedia on average, and can detect contradictions at scale. Costs $1-$3 per query. Implements relevance-based scoring (ranked chunk retrieval + reranking + contextual summarization) but not formal bias assessment, calibrated probability, competing hypotheses, search transparency logging, or self-audit. Quality strategy relies on retrieval accuracy and citation grounding rather than analytical rigor frameworks.
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Contradicts — most advanced research agent lacks comprehensive framework | Strong |
| H2 | Supports for specific features — none of five target features present | Moderate |
| H3 | Supports — high quality without analytical rigor features demonstrates the dominant design pattern | Strong |
Context¶
PaperQA2 represents the state of the art in AI scientific research. Its performance demonstrates that high accuracy is achievable through retrieval engineering alone, which may explain why the field has not invested in analytical rigor frameworks — the accuracy-focused approach "works well enough" for many use cases.
Notes¶
The contradiction detection feature ("contractrow" setting) is the closest PaperQA2 comes to structured analytical methodology, but it operates at the individual claim level rather than as part of a competing hypotheses framework.