Skip to content

R0040/2026-03-28/Q002/S04/R01

Research R0040 — RLHF Alternatives
Run 2026-03-28
Query Q002
Search S04
Result S04-R01

Khan et al. paper on using DPO to mitigate sycophancy.

Summary

Field Value
Title Mitigating Sycophancy in Large Language Models via Direct Preference Optimization
URL https://ieeexplore.ieee.org/document/10825538/
Date accessed 2026-03-28
Publication date 2024
Author(s) Khan et al.
Publication IEEE International Conference on Big Data 2024

Selection Decision

Included in evidence base: Yes

Rationale: Demonstrates that DPO with sycophancy-labeled preference pairs can reduce sycophancy by 84-85%. Directly addresses whether RLHF alternatives can mitigate sycophancy, and shows that the key is the training DATA (anti-sycophancy pairs) rather than the training ALGORITHM.