S01-R01 — Towards Understanding Sycophancy in Language Models¶
Summary¶
| Title | Towards Understanding Sycophancy in Language Models |
| URL | https://arxiv.org/abs/2310.13548 |
| Date accessed | 2026-03-29 |
| Publication date | October 2023 (ICLR 2024) |
| Authors | Mrinank Sharma et al. (19 authors, Anthropic / Oxford) |
| Publication | ICLR 2024 |
Selection Decision¶
Selected as the primary paper establishing the RLHF-sycophancy causal link. Peer-reviewed at ICLR 2024.