Q003 — Self-Audit
Domain 1: Study Eligibility Criteria
| Criterion |
Assessment |
Notes |
| Inclusion criteria clearly defined |
Pass |
AI-powered research tools with structured analytical features |
| Exclusion criteria clearly defined |
Pass |
Simple chatbots, general-purpose LLM interfaces |
| Criteria applied consistently |
Pass |
Same 5-dimension framework applied to all tools |
| Criteria appropriate for the query |
Pass |
Five dimensions directly from query requirements |
Domain 2: Search Comprehensiveness
| Criterion |
Assessment |
Notes |
| Multiple search strategies used |
Pass |
3 searches covering different tool categories |
| Commercial tools searched |
Pass |
Elicit, Scite, Perplexity, OpenAI, Consensus |
| Open-source tools searched |
Pass |
GPT-Researcher, STORM, Khoj |
| Academic evaluations searched |
Pass |
JMIR viewpoint, Cochrane evaluations |
| Search terms varied |
Pass |
Tool names, feature names, framework names |
Domain 3: Evaluation Consistency
| Criterion |
Assessment |
Notes |
| Same scoring criteria applied |
Pass |
5-dimension checklist applied to every tool |
| Commercial/open-source treated equally |
Pass |
Both categories scored against same framework |
| Source independence assessed |
Pass |
Multiple independent evaluations used |
| Outliers identified |
Pass |
STORM identified as closest to competing hypotheses |
Domain 4: Synthesis Fairness
| Criterion |
Assessment |
Notes |
| All evidence considered |
Pass |
8 sources, 8 evidence items |
| Alternative interpretations considered |
Pass |
Three hypotheses including H1 (comprehensive tools exist) |
| Confidence level justified |
Pass |
High confidence based on comprehensive tool coverage |
| Gaps acknowledged |
Pass |
Enterprise tools, custom GPTs, rapid evolution |
Domain 5: Source-Back Verification
| Source |
Claim Verified |
Match |
| SRC01 |
99.4% extraction accuracy, systematic review workflow |
Match |
| SRC02 |
1.6B+ citations classified supporting/contrasting |
Match |
| SRC03 |
220M papers, TLDR summaries |
Match |
| SRC04 |
Multi-perspective question asking |
Match |
| SRC05 |
Multi-agent research with citations |
Match |
| SRC06 |
Incremental progress, citation problems |
Match |
| SRC07 |
Sentence-level attribution |
Match |
| SRC08 |
Source traceability, self-hostable |
Match |
Overall Assessment
Low risk of bias. Comprehensive coverage of the tool landscape across commercial, free, and open-source categories. The five-dimension framework provides consistent evaluation criteria. Primary limitation is the IC-derived framing of the queried dimensions, which may not match what these tools were designed to do.