SRC04 — RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback¶
Source¶
| Title | RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback |
| Publisher | ICML 2024 / arXiv |
| Authors | Harrison Lee, Samrat Phatale, Hassan Mansoor, et al. (Google) |
| Date | September 2023 (published ICML 2024) |
| URL | https://arxiv.org/abs/2309.00267 |
| Type | Peer-reviewed conference paper |
Summary Ratings¶
| Dimension | Rating |
|---|---|
| Reliability | High |
| Relevance | High |
| Missing data bias | Low |
| Measurement bias | Medium |
| Selective reporting bias | Low |
| Randomization bias | N/A |
| Protocol deviation bias | Low |
| COI / Funding bias | Medium |
Rationale¶
| Dimension | Rationale |
|---|---|
| Reliability | Peer-reviewed at ICML 2024, comprehensive multi-task comparison |
| Relevance | Directly compares RLAIF to RLHF with head-to-head evaluation |
| COI / Funding | Google research; commercial interest in scalable alignment for Gemini models |
Evidence Extracts¶
| Evidence | Summary |
|---|---|
| SRC04-E01 | RLAIF achieves comparable or superior performance to RLHF at much lower cost |