R0055/2026-04-01/C005/SRC01¶
Khan et al. 2024
Source¶
| Field | Value |
|---|---|
| Title | Mitigating Sycophancy in Large Language Models via Direct Preference Optimization |
| Publisher | Various |
| Author(s) | Various |
| Date | 2024-2026 |
| URL | https://experts.umn.edu/en/publications/mitigating-sycophancy-in-large-language-models-via-direct-prefere |
| Type | Research paper |
Summary¶
| Dimension | Rating |
|---|---|
| Reliability | Medium-High |
| Relevance | High |
| Bias: Missing data | Low risk |
| Bias: Measurement | Low risk |
| Bias: Selective reporting | Low risk |
| Bias: Randomization | N/A — not an RCT |
| Bias: Protocol deviation | N/A — not an RCT |
| Bias: COI/Funding | Low risk |
Rationale¶
| Dimension | Rationale |
|---|---|
| Reliability | Medium-High — Research paper from established source |
| Relevance | High — directly addresses the claim |
| Bias flags | No significant bias concerns identified |
Evidence Extracts¶
| Evidence ID | Summary |
|---|---|
| SRC01-E01 | 85% reduction in persona tests, 84% in preference tests using DPO with curated anti-sycophancy pairs |