Skip to content

R0055/2026-04-01/C005/SRC01

Research R0055 — RLHF Yes-Men Claims
Run 2026-04-01
Claim C005
Search S01
Result S01-R01
Source SRC01

Khan et al. 2024

Source

Field Value
Title Mitigating Sycophancy in Large Language Models via Direct Preference Optimization
Publisher Various
Author(s) Various
Date 2024-2026
URL https://experts.umn.edu/en/publications/mitigating-sycophancy-in-large-language-models-via-direct-prefere
Type Research paper

Summary

Dimension Rating
Reliability Medium-High
Relevance High
Bias: Missing data Low risk
Bias: Measurement Low risk
Bias: Selective reporting Low risk
Bias: Randomization N/A — not an RCT
Bias: Protocol deviation N/A — not an RCT
Bias: COI/Funding Low risk

Rationale

Dimension Rationale
Reliability Medium-High — Research paper from established source
Relevance High — directly addresses the claim
Bias flags No significant bias concerns identified

Evidence Extracts

Evidence ID Summary
SRC01-E01 85% reduction in persona tests, 84% in preference tests using DPO with curated anti-sycophancy pairs