R0058/2026-04-03
Single-claim research run testing the candidate evidence workflow. The claim asserts specific homophily and bridging statistics about AI safety-ethics research communities, with a researcher-provided candidate evidence URL.
Claims
C001 — AI safety-ethics homophily — Likely (55-80%)
Claim: AI research communities show 83% homophily between safety and ethics subfields, with only 1% of authors bridging the divide.
Verdict: Partially correct. The 83% homophily figure is confirmed (83.1% from Roytburg & Miller, 2025). The "1% bridging" is a mischaracterization: the source reports the top 1% of authors by network degree control 58% of cross-disciplinary paths, not that only 1% of authors bridge.
| Hypothesis |
Status |
Probability |
| H1: Accurate as stated |
Eliminated |
— |
| H2: Partially correct |
Supported |
Likely (55-80%) |
| H3: Materially wrong |
Eliminated |
— |
Confidence: High · Sources: 4 · Searches: 5
Full analysis
Collection Analysis
Cross-Cutting Patterns
| Pattern |
Claims Affected |
Significance |
| Single-study dependence |
C001 |
The specific numerical claims rest on a single unreplicated preprint (Roytburg & Miller, 2025). Independent sources corroborate the phenomenon qualitatively but not the specific numbers. |
| Candidate evidence as primary source |
C001 |
The researcher-provided candidate evidence turned out to be the primary (and only direct) source for the claim's numerical assertions. The candidate evidence workflow correctly identified a mischaracterization despite the researcher having pre-identified the source. |
Collection Statistics
| Metric |
Value |
| Claims investigated |
1 |
| Confirmed with caveats (Likely) |
1 (C001) |
Source Independence Assessment
Four sources were scored, of which three (SRC02, SRC03, SRC04) are fully independent of the primary source (SRC01). However, independence is limited in impact because the three independent sources provide contextual corroboration rather than direct measurement replication. The primary quantitative finding (83.1% homophily) rests entirely on SRC01.
Collection Gaps
| Gap |
Impact |
Mitigation |
| No replication of homophily measurement |
Moderate — single-study risk |
Methodology appears rigorous; corroborated qualitatively by 3 independent sources |
| No data on actual percentage of bridging authors |
Low — affects only the "1% bridging" component which is already flagged as mischaracterized |
The 9.5% mixed-paper rate provides a partial proxy |
| Study venue coverage limited to 12 ML/NLP conferences |
Low-moderate — may undercount cross-field work in other venues |
Acknowledged in assessment; future studies with broader venue coverage would address this |
Collection Self-Audit
| Domain |
Rating |
Notes |
| Eligibility criteria |
Low risk |
Clear numerical claim with well-defined evidence criteria |
| Search comprehensiveness |
Some concerns |
5 searches across multiple strategies, but the topic is niche with limited literature |
| Evaluation consistency |
Low risk |
All sources scored using the same framework; candidate evidence given no special treatment |
| Synthesis fairness |
Low risk |
Finding of partial incorrectness demonstrates resistance to confirmation bias |
Resources
Summary
| Metric |
Value |
| Claims investigated |
1 |
| Files produced |
83 |
| Sources scored |
4 |
| Evidence extracts |
6 |
| Results dispositioned |
9 selected + 52 rejected = 61 total |
| Tool |
Uses |
Purpose |
| WebSearch |
6 |
Search queries across multiple strategies |
| WebFetch |
4 |
Page content retrieval (candidate evidence + 3 sources) |
| Write |
60 |
File creation (all output files) |
| Read |
2 |
Reading methodology and output format specs |
| Edit |
0 |
No file modifications needed |
| Bash |
2 |
Directory creation and stub file generation |
Token Distribution
| Category |
Tokens |
| Input (context) |
~150,000 |
| Output (generation) |
~30,000 |
| Total |
~180,000 |