R0044/2026-04-01/Q002/S01/R04¶
ICLR 2024 paper on understanding sycophancy in language models
Summary¶
| Field | Value |
|---|---|
| Title | Towards Understanding Sycophancy in Language Models |
| URL | https://arxiv.org/abs/2310.13548 |
| Date accessed | 2026-04-01 |
| Publication date | 2024 (ICLR conference) |
| Author(s) | Sharma, Tong, Korbak, et al. (19 authors) |
| Publication | ICLR 2024 |
Selection Decision¶
Included in evidence base: No (used as supporting context; not scored as a separate source because its findings are subsumed by SRC01)
Rationale: Foundational technical paper demonstrating sycophancy prevalence across 5 AI models. Key finding: human preference models prefer sycophantic responses over correct ones a non-negligible fraction of the time. Overlaps significantly with the later Science paper by the same lead author.