R01¶


Research	R0040 — RLHF Alternatives
Run	2026-03-28
Query	Q002
Search	S04
Result	S04-R01

Khan et al. paper on using DPO to mitigate sycophancy.

Summary¶

Field	Value
Title	Mitigating Sycophancy in Large Language Models via Direct Preference Optimization
URL	https://ieeexplore.ieee.org/document/10825538/
Date accessed	2026-03-28
Publication date	2024
Author(s)	Khan et al.
Publication	IEEE International Conference on Big Data 2024

Selection Decision¶

Included in evidence base: Yes

Rationale: Demonstrates that DPO with sycophancy-labeled preference pairs can reduce sycophancy by 84-85%. Directly addresses whether RLHF alternatives can mitigate sycophancy, and shows that the key is the training DATA (anti-sycophancy pairs) rather than the training ALGORITHM.