Q002 — Self-Audit¶


Research	R0044 — Expanded Vocabulary Research
Run	2026-04-01
Query	Q002

ROBIS 4-Domain Audit¶

Domain 1: Eligibility Criteria¶

Rating: Low risk

Criterion	Assessment
Criteria defined before searching	Yes — sought empirical studies with measurable outcomes, not theoretical risk assessments
Criteria applied consistently	Yes — distinguished lab evidence from field incidents throughout
Criteria shift detected	No

Notes: Clear distinction maintained between experimental evidence and field incident reports.

Domain 2: Search Comprehensiveness¶

Rating: Low risk

Criterion	Assessment
Multiple search strategies used	Yes — 3 searches targeting sycophancy harms, healthcare AI errors, and military/professional contexts
Searches designed to test each hypothesis	Yes — searched for both presence and absence of evidence
All results dispositioned	Yes — 50 results returned, all dispositioned
Source diversity achieved	Yes — Science, Nature Communications, ISQ, JMIR, Georgetown

Notes: Good coverage of sycophancy research and healthcare domain. Military domain covered. Engineering and finance domains yielded no specific evidence — this absence is documented as a gap.

Domain 3: Evaluation Consistency¶

Rating: Low risk

Criterion	Assessment
All sources scored using same framework	Yes
Evidence typed consistently	Yes — Statistical, Analytical, Reported typing applied
ACH matrix applied	Yes
Diagnosticity analysis performed	Yes

Notes: Consistent evaluation across all sources.

Domain 4: Synthesis Fairness¶

Rating: Low risk

Criterion	Assessment
All hypotheses given fair hearing	Yes — H1 (extensive field evidence) was actively searched for
Contradictory evidence surfaced	N/A — all evidence pointed in same direction
Confidence calibrated to evidence	Yes — Medium reflects strong lab evidence but sparse field documentation
Gaps acknowledged	Yes — engineering and finance gaps, absence of incident reporting infrastructure

Notes: No contradictory evidence was found — all sources agree on the direction of harm. This unanimity is itself a finding worth noting.

Domain 5: Source-Back Verification¶

Rating: Low risk

Source	Claim in Assessment	Source Actually Says	Match?
SRC01	AI models affirm users 49% more than humans	Multiple secondary sources confirm this statistic from the Science paper	Yes
SRC04	False confirmation is "most pernicious" error type	Secondary source confirms this characterization	Yes
SRC05	25-29% switching rates at moderate AI exposure	Directly fetched content confirms these figures	Yes

Discrepancies found: 0

Corrections applied: None needed

Unresolved flags: None

Notes: The Science paper (SRC01) was not directly accessible (403 error), so statistics rely on multiple consistent secondary sources (Stanford Report, Fortune, Scientific American, AI Business Review). The consistency across independent news sources provides reasonable confidence in the reported figures.

Overall Assessment¶

Overall risk of bias: Low risk

Strong experimental evidence consistently points in one direction. Main limitation is the gap between lab evidence and field documentation.

Researcher Bias Check¶

Harm-seeking bias: The query specifically asks for evidence of harm, which could bias toward finding and emphasizing negative findings. Mitigated by: clearly noting the lab-vs-field gap, documenting domain gaps (engineering, finance), and not extrapolating from consumer to professional contexts without evidence.
Vocabulary bias: The expanded vocabulary search was effective in finding healthcare-specific evidence (false confirmation) that would have been missed with AI-safety-only terminology (sycophancy).