Skip to content

R0058/2026-04-03

Research R0058 — Candidate evidence test
Mode Claim
Run date 2026-04-03
Claims 1
Prompt ai-research-methodology research.md
Model Claude Opus 4.6

Single-claim research run testing the candidate evidence workflow. The claim asserts specific homophily and bridging statistics about AI safety-ethics research communities, with a researcher-provided candidate evidence URL.

Claims

C001 — AI safety-ethics homophily — Likely (55-80%)

Claim: AI research communities show 83% homophily between safety and ethics subfields, with only 1% of authors bridging the divide.

Verdict: Partially correct. The 83% homophily figure is confirmed (83.1% from Roytburg & Miller, 2025). The "1% bridging" is a mischaracterization: the source reports the top 1% of authors by network degree control 58% of cross-disciplinary paths, not that only 1% of authors bridge.

Hypothesis Status Probability
H1: Accurate as stated Eliminated
H2: Partially correct Supported Likely (55-80%)
H3: Materially wrong Eliminated

Confidence: High · Sources: 4 · Searches: 5

Full analysis


Collection Analysis

Cross-Cutting Patterns

Pattern Claims Affected Significance
Single-study dependence C001 The specific numerical claims rest on a single unreplicated preprint (Roytburg & Miller, 2025). Independent sources corroborate the phenomenon qualitatively but not the specific numbers.
Candidate evidence as primary source C001 The researcher-provided candidate evidence turned out to be the primary (and only direct) source for the claim's numerical assertions. The candidate evidence workflow correctly identified a mischaracterization despite the researcher having pre-identified the source.

Collection Statistics

Metric Value
Claims investigated 1
Confirmed with caveats (Likely) 1 (C001)

Source Independence Assessment

Four sources were scored, of which three (SRC02, SRC03, SRC04) are fully independent of the primary source (SRC01). However, independence is limited in impact because the three independent sources provide contextual corroboration rather than direct measurement replication. The primary quantitative finding (83.1% homophily) rests entirely on SRC01.

Collection Gaps

Gap Impact Mitigation
No replication of homophily measurement Moderate — single-study risk Methodology appears rigorous; corroborated qualitatively by 3 independent sources
No data on actual percentage of bridging authors Low — affects only the "1% bridging" component which is already flagged as mischaracterized The 9.5% mixed-paper rate provides a partial proxy
Study venue coverage limited to 12 ML/NLP conferences Low-moderate — may undercount cross-field work in other venues Acknowledged in assessment; future studies with broader venue coverage would address this

Collection Self-Audit

Domain Rating Notes
Eligibility criteria Low risk Clear numerical claim with well-defined evidence criteria
Search comprehensiveness Some concerns 5 searches across multiple strategies, but the topic is niche with limited literature
Evaluation consistency Low risk All sources scored using the same framework; candidate evidence given no special treatment
Synthesis fairness Low risk Finding of partial incorrectness demonstrates resistance to confirmation bias

Resources

Summary

Metric Value
Claims investigated 1
Files produced 83
Sources scored 4
Evidence extracts 6
Results dispositioned 9 selected + 52 rejected = 61 total

Tool Breakdown

Tool Uses Purpose
WebSearch 6 Search queries across multiple strategies
WebFetch 4 Page content retrieval (candidate evidence + 3 sources)
Write 60 File creation (all output files)
Read 2 Reading methodology and output format specs
Edit 0 No file modifications needed
Bash 2 Directory creation and stub file generation

Token Distribution

Category Tokens
Input (context) ~150,000
Output (generation) ~30,000
Total ~180,000