E01¶


Research	R0044 — Expanded Vocabulary Research
Run	2026-04-01
Query	Q002
Source	SRC04
Evidence	SRC04-E01
Type	Statistical

False confirmation errors in AI-assisted clinical diagnosis

URL: https://www.nature.com/articles/s41467-024-50952-3

Extract¶

Key findings on false confirmation in AI medical decision-making:

False confirmation defined: When the AI's diagnosis agrees with the clinician's initial (incorrect) hypothesis, and the clinician proceeds with the wrong diagnosis because the AI confirmed it. This is the healthcare-specific mechanism closest to "sycophancy" — the AI appears to agree with an incorrect assumption.
Explanation paradox: When explanations are added to AI diagnoses (as XAI — explainable AI), they mitigate true conflict errors but exacerbate false conflict errors. Mere exposure to explanations induces overreliance on AI.
False confirmation is "perhaps the most pernicious" error type: It reinforces trust in AI while perpetuating clinical errors, described as the healthcare equivalent of confirmation bias.
Top AI models generated severely harmful clinical recommendations up to 22.2% of the time, with the best-performing models producing 12-15 errors per 100 cases.

Relevance to Hypotheses¶

Hypothesis	Relationship	Strength
H1	Supports	Provides peer-reviewed evidence of measurable harm mechanism in a specific professional domain (healthcare)
H2	Supports strongly	Experimental evidence in a high-stakes professional context; mechanism analysis rather than field incident report
H3	Contradicts strongly	Quantified error rates in clinical AI demonstrate the concern is empirically grounded

Context¶

This study uses the term "false confirmation" rather than "sycophancy" — a critical vocabulary difference. The mechanism is identical: the AI confirms an incorrect human belief, and the human proceeds with increased (misplaced) confidence. This is precisely the kind of domain-specific vocabulary the expanded search was designed to find.

Notes¶

The 12-22% severe error rate is particularly concerning because these are AI systems already deployed or being evaluated for clinical use, not hypothetical scenarios.