Skip to content

SRC04 — RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Source

Title RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Publisher ICML 2024 / arXiv
Authors Harrison Lee, Samrat Phatale, Hassan Mansoor, et al. (Google)
Date September 2023 (published ICML 2024)
URL https://arxiv.org/abs/2309.00267
Type Peer-reviewed conference paper

Summary Ratings

Dimension Rating
Reliability High
Relevance High
Missing data bias Low
Measurement bias Medium
Selective reporting bias Low
Randomization bias N/A
Protocol deviation bias Low
COI / Funding bias Medium

Rationale

Dimension Rationale
Reliability Peer-reviewed at ICML 2024, comprehensive multi-task comparison
Relevance Directly compares RLAIF to RLHF with head-to-head evaluation
COI / Funding Google research; commercial interest in scalable alignment for Gemini models

Evidence Extracts

Evidence Summary
SRC04-E01 RLAIF achieves comparable or superior performance to RLHF at much lower cost