SRC05¶

DPO-based sycophancy mitigation achieving 84-85% reduction.

Source¶

Field	Value
Title	Mitigating Sycophancy in Large Language Models via Direct Preference Optimization
Publisher	IEEE International Conference on Big Data 2024
Author(s)	Khan et al.
Date	2024
URL	https://ieeexplore.ieee.org/document/10825538/
Type	Research paper (peer-reviewed)

Dimension	Rationale
Reliability	Peer-reviewed at IEEE Big Data. However, the dataset is relatively small (1000 prompts) and results may not generalize across all sycophancy types.
Relevance	Directly demonstrates that DPO (an RLHF alternative) can specifically target sycophancy reduction when paired with appropriate training data.
Bias flags	Measurement: sycophancy reduction measured on specific test sets — may not generalize. Missing data: limited to persona-based and preference-driven sycophancy; does not cover all types. Selective reporting: 84-85% reduction is an average; variance not reported.

Evidence ID	Summary
SRC05-E01	DPO with anti-sycophancy preference pairs reduces sycophancy 84-85%