R0055/2026-04-01/C005/SRC01/E01¶
85% reduction in persona tests, 84% in preference tests using DPO with curated anti-sycophancy pairs
Extract¶
Khan et al. fine-tuned LLMs on 1,000 prompts with sycophantic and non-sycophantic response pairs using DPO. Achieved 85% average reduction in persona-based sycophancy tests and 84% in preference-driven tests. The key insight: the data curation drives the reduction, not changes to the optimization algorithm.
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Supports | Strong |
| H2 | Supports | Moderate |
| H3 | Contradicts | Strong |
Context¶
Evidence directly relevant to testing the claim's factual assertions.