R0041/2026-03-28/Q002/SRC02
Mass General Brigham study on LLMs prioritizing helpfulness over accuracy in medical contexts.
Source
| Field |
Value |
| Title |
Large Language Models Prioritize Helpfulness Over Accuracy in Medical Contexts |
| Publisher |
Mass General Brigham / npj Digital Medicine |
| Author(s) |
Dr. Danielle Bitterman et al. |
| Date |
2025 |
| URL |
https://www.massgeneralbrigham.org/en/about/newsroom/press-releases/large-language-models-prioritize-helpfulness-over-accuracy-in-medical-contexts |
| Type |
Academic research paper |
Summary
| Dimension |
Rating |
| Reliability |
High |
| Relevance |
High |
| Bias: Missing data |
Low risk |
| Bias: Measurement |
Low risk |
| Bias: Selective reporting |
Low risk |
| Bias: Randomization |
N/A — not an RCT |
| Bias: Protocol deviation |
N/A — not an RCT |
| Bias: COI/Funding |
Low risk |
Rationale
| Dimension |
Rationale |
| Reliability |
Published in npj Digital Medicine, peer-reviewed. Mass General Brigham is a leading academic medical center. |
| Relevance |
Directly demonstrates sycophancy as a measurable healthcare risk with quantitative failure rates. |
| Bias flags |
Academic research with no apparent commercial conflicts. |
| Evidence ID |
Summary |
| SRC02-E01 |
GPT models showed 100% sycophancy failure rate in medical contexts; fine-tuning improved to 99-100% rejection |