E01¶


Research	R0055 — RLHF Yes-Men Claims
Run	2026-04-01
Claim	C002
Source	SRC01
Evidence	SRC01-E01
Type	Factual

RLHF pipeline described: human labelers express preferences used to train reward models

URL: https://arxiv.org/pdf/2310.13548

Extract¶

RLHF trains models using human preference data. Labelers compare outputs and express which they prefer, creating training signal for reward models that guide policy optimization via reinforcement learning.

Relevance to Hypotheses¶

Hypothesis	Relationship	Strength
H1	Supports	Strong
H2	Supports	Moderate
H3	Contradicts	Strong

Context¶

Evidence directly relevant to testing the claim's factual assertions.