R0058/2026-04-03¶


Research	R0058 — Candidate evidence test
Mode	Claim
Run date	2026-04-03
Claims	1
Prompt	ai-research-methodology research.md
Model	Claude Opus 4.6

Single-claim research run testing the candidate evidence workflow. The claim asserts specific homophily and bridging statistics about AI safety-ethics research communities, with a researcher-provided candidate evidence URL.

Claims¶

C001 — AI safety-ethics homophily — Likely (55-80%)

Claim: AI research communities show 83% homophily between safety and ethics subfields, with only 1% of authors bridging the divide.

Verdict: Partially correct. The 83% homophily figure is confirmed (83.1% from Roytburg & Miller, 2025). The "1% bridging" is a mischaracterization: the source reports the top 1% of authors by network degree control 58% of cross-disciplinary paths, not that only 1% of authors bridge.

Hypothesis	Status	Probability
H1: Accurate as stated	Eliminated	—
H2: Partially correct	Supported	Likely (55-80%)
H3: Materially wrong	Eliminated	—

Confidence: High · Sources: 4 · Searches: 5

Full analysis

Collection Analysis¶

Cross-Cutting Patterns¶

Pattern	Claims Affected	Significance
Single-study dependence	C001	The specific numerical claims rest on a single unreplicated preprint (Roytburg & Miller, 2025). Independent sources corroborate the phenomenon qualitatively but not the specific numbers.
Candidate evidence as primary source	C001	The researcher-provided candidate evidence turned out to be the primary (and only direct) source for the claim's numerical assertions. The candidate evidence workflow correctly identified a mischaracterization despite the researcher having pre-identified the source.

Collection Statistics¶

Metric	Value
Claims investigated	1
Confirmed with caveats (Likely)	1 (C001)

Source Independence Assessment¶

Four sources were scored, of which three (SRC02, SRC03, SRC04) are fully independent of the primary source (SRC01). However, independence is limited in impact because the three independent sources provide contextual corroboration rather than direct measurement replication. The primary quantitative finding (83.1% homophily) rests entirely on SRC01.

Collection Gaps¶

Gap	Impact	Mitigation
No replication of homophily measurement	Moderate — single-study risk	Methodology appears rigorous; corroborated qualitatively by 3 independent sources
No data on actual percentage of bridging authors	Low — affects only the "1% bridging" component which is already flagged as mischaracterized	The 9.5% mixed-paper rate provides a partial proxy
Study venue coverage limited to 12 ML/NLP conferences	Low-moderate — may undercount cross-field work in other venues	Acknowledged in assessment; future studies with broader venue coverage would address this

Collection Self-Audit¶

Domain	Rating	Notes
Eligibility criteria	Low risk	Clear numerical claim with well-defined evidence criteria
Search comprehensiveness	Some concerns	5 searches across multiple strategies, but the topic is niche with limited literature
Evaluation consistency	Low risk	All sources scored using the same framework; candidate evidence given no special treatment
Synthesis fairness	Low risk	Finding of partial incorrectness demonstrates resistance to confirmation bias

Resources¶

Summary¶

Metric	Value
Claims investigated	1
Files produced	83
Sources scored	4
Evidence extracts	6
Results dispositioned	9 selected + 52 rejected = 61 total

Tool Breakdown¶

Tool	Uses	Purpose
WebSearch	6	Search queries across multiple strategies
WebFetch	4	Page content retrieval (candidate evidence + 3 sources)
Write	60	File creation (all output files)
Read	2	Reading methodology and output format specs
Edit	0	No file modifications needed
Bash	2	Directory creation and stub file generation

Token Distribution¶

Category	Tokens
Input (context)	~150,000
Output (generation)	~30,000
Total	~180,000