Skip to content

Q003 — Self-Audit

Domain 1: Study Eligibility Criteria

Criterion Assessment Notes
Inclusion criteria clearly defined Pass AI-powered research tools with structured analytical features
Exclusion criteria clearly defined Pass Simple chatbots, general-purpose LLM interfaces
Criteria applied consistently Pass Same 5-dimension framework applied to all tools
Criteria appropriate for the query Pass Five dimensions directly from query requirements

Domain 2: Search Comprehensiveness

Criterion Assessment Notes
Multiple search strategies used Pass 3 searches covering different tool categories
Commercial tools searched Pass Elicit, Scite, Perplexity, OpenAI, Consensus
Open-source tools searched Pass GPT-Researcher, STORM, Khoj
Academic evaluations searched Pass JMIR viewpoint, Cochrane evaluations
Search terms varied Pass Tool names, feature names, framework names

Domain 3: Evaluation Consistency

Criterion Assessment Notes
Same scoring criteria applied Pass 5-dimension checklist applied to every tool
Commercial/open-source treated equally Pass Both categories scored against same framework
Source independence assessed Pass Multiple independent evaluations used
Outliers identified Pass STORM identified as closest to competing hypotheses

Domain 4: Synthesis Fairness

Criterion Assessment Notes
All evidence considered Pass 8 sources, 8 evidence items
Alternative interpretations considered Pass Three hypotheses including H1 (comprehensive tools exist)
Confidence level justified Pass High confidence based on comprehensive tool coverage
Gaps acknowledged Pass Enterprise tools, custom GPTs, rapid evolution

Domain 5: Source-Back Verification

Source Claim Verified Match
SRC01 99.4% extraction accuracy, systematic review workflow Match
SRC02 1.6B+ citations classified supporting/contrasting Match
SRC03 220M papers, TLDR summaries Match
SRC04 Multi-perspective question asking Match
SRC05 Multi-agent research with citations Match
SRC06 Incremental progress, citation problems Match
SRC07 Sentence-level attribution Match
SRC08 Source traceability, self-hostable Match

Overall Assessment

Low risk of bias. Comprehensive coverage of the tool landscape across commercial, free, and open-source categories. The five-dimension framework provides consistent evaluation criteria. Primary limitation is the IC-derived framing of the queried dimensions, which may not match what these tools were designed to do.