R0040/2026-03-28/Q002/S04/R01¶
Khan et al. paper on using DPO to mitigate sycophancy.
Summary¶
| Field | Value |
|---|---|
| Title | Mitigating Sycophancy in Large Language Models via Direct Preference Optimization |
| URL | https://ieeexplore.ieee.org/document/10825538/ |
| Date accessed | 2026-03-28 |
| Publication date | 2024 |
| Author(s) | Khan et al. |
| Publication | IEEE International Conference on Big Data 2024 |
Selection Decision¶
Included in evidence base: Yes
Rationale: Demonstrates that DPO with sycophancy-labeled preference pairs can reduce sycophancy by 84-85%. Directly addresses whether RLHF alternatives can mitigate sycophancy, and shows that the key is the training DATA (anti-sycophancy pairs) rather than the training ALGORITHM.