SRC05 — Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback¶
Source¶
| Title | Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback |
| Publisher | TMLR 2023 / arXiv |
| Authors | Stephen Casper, et al. (32 co-authors) |
| Date | July 2023 |
| URL | https://arxiv.org/abs/2307.15217 |
| Type | Peer-reviewed survey paper |
Summary Ratings¶
| Dimension | Rating |
|---|---|
| Reliability | High |
| Relevance | High |
| Missing data bias | Low |
| Measurement bias | Low |
| Selective reporting bias | Low |
| Randomization bias | N/A |
| Protocol deviation bias | N/A |
| COI / Funding bias | Low |
Rationale¶
| Dimension | Rationale |
|---|---|
| Reliability | Comprehensive survey of 250+ papers, 32 co-authors from multiple institutions |
| Relevance | Catalogues the specific problems driving the search for RLHF alternatives |
| COI / Funding | Multi-institutional; no single commercial entity dominates |
Evidence Extracts¶
| Evidence | Summary |
|---|---|
| SRC05-E01 | RLHF has both tractable problems and fundamental limitations across feedback, reward model, and policy |