Skip to content

R0055/2026-04-01/C002/SRC01/E01

Research R0055 — RLHF Yes-Men Claims
Run 2026-04-01
Claim C002
Source SRC01
Evidence SRC01-E01
Type Factual

RLHF pipeline described: human labelers express preferences used to train reward models

URL: https://arxiv.org/pdf/2310.13548

Extract

RLHF trains models using human preference data. Labelers compare outputs and express which they prefer, creating training signal for reward models that guide policy optimization via reinforcement learning.

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Supports Strong
H2 Supports Moderate
H3 Contradicts Strong

Context

Evidence directly relevant to testing the claim's factual assertions.