SRC01¶

Anthropic/ICLR RLHF study

Source¶

Field	Value
Title	Towards Understanding Sycophancy in Language Models
Publisher	Various
Author(s)	Various
Date	2024-2026
URL	https://arxiv.org/pdf/2310.13548
Type	Research paper

Dimension	Rationale
Reliability	High — Research paper from established source
Relevance	High — directly addresses the claim
Bias flags	No significant bias concerns identified

Evidence ID	Summary
SRC01-E01	RLHF pipeline described: human labelers express preferences used to train reward models