Skip to content
Research R0040 — RLHF Alternatives
Run 2026-03-29
Query Q002 — RLHF and Sycophancy
Search S01
Result S01-R01

S01-R01 — Towards Understanding Sycophancy in Language Models

Summary

Title Towards Understanding Sycophancy in Language Models
URL https://arxiv.org/abs/2310.13548
Date accessed 2026-03-29
Publication date October 2023 (ICLR 2024)
Authors Mrinank Sharma et al. (19 authors, Anthropic / Oxford)
Publication ICLR 2024

Selection Decision

Selected as the primary paper establishing the RLHF-sycophancy causal link. Peer-reviewed at ICLR 2024.