SRC04 — RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback¶

Source¶


Title	RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Publisher	ICML 2024 / arXiv
Authors	Harrison Lee, Samrat Phatale, Hassan Mansoor, et al. (Google)
Date	September 2023 (published ICML 2024)
URL	https://arxiv.org/abs/2309.00267
Type	Peer-reviewed conference paper

Dimension	Rationale
Reliability	Peer-reviewed at ICML 2024, comprehensive multi-task comparison
Relevance	Directly compares RLAIF to RLHF with head-to-head evaluation
COI / Funding	Google research; commercial interest in scalable alignment for Gemini models

Evidence	Summary
SRC04-E01	RLAIF achieves comparable or superior performance to RLHF at much lower cost