SRC08 — Open Problems and Fundamental Limitations of RLHF¶

Source¶


Title	Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Publisher	TMLR 2023 / arXiv
Authors	Stephen Casper, et al. (32 co-authors)
Date	July 2023
URL	https://arxiv.org/abs/2307.15217
Type	Peer-reviewed survey paper

Dimension	Rationale
Reliability	Comprehensive survey of 250+ papers, multi-institutional authorship
Relevance	Systematically categorizes RLHF problems including those that drive sycophancy

Evidence	Summary
SRC08-E01	RLHF has fundamental (not just tractable) limitations in feedback, reward models, and policy