Skip to content

SRC05 — Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Source

Title Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Publisher TMLR 2023 / arXiv
Authors Stephen Casper, et al. (32 co-authors)
Date July 2023
URL https://arxiv.org/abs/2307.15217
Type Peer-reviewed survey paper

Summary Ratings

Dimension Rating
Reliability High
Relevance High
Missing data bias Low
Measurement bias Low
Selective reporting bias Low
Randomization bias N/A
Protocol deviation bias N/A
COI / Funding bias Low

Rationale

Dimension Rationale
Reliability Comprehensive survey of 250+ papers, 32 co-authors from multiple institutions
Relevance Catalogues the specific problems driving the search for RLHF alternatives
COI / Funding Multi-institutional; no single commercial entity dominates

Evidence Extracts

Evidence Summary
SRC05-E01 RLHF has both tractable problems and fundamental limitations across feedback, reward model, and policy