E01¶


Research	R0055 — RLHF Yes-Men Claims
Run	2026-04-01
Claim	C007
Source	SRC01
Evidence	SRC01-E01
Type	Factual

All six methods confirmed as post-training alternatives to RLHF, with varying adoption levels

URL: https://llm-stats.com/blog/research/post-training-techniques-2026

Extract¶

The 2026 post-training landscape includes DPO, SimPO, KTO, GRPO, ORPO, IPO, Constitutional AI, RLVR, and DAPO. DPO and GRPO are widely adopted in production pipelines. Constitutional AI is specific to Anthropic. KTO and ORPO have more limited adoption. RLVR is emerging for verifiable-answer domains. Modern pipelines combine SFT + DPO + GRPO + Constitutional AI guardrails.

Relevance to Hypotheses¶

Hypothesis	Relationship	Strength
H1	Supports	Moderate
H2	Supports	Strong
H3	Contradicts	Strong

Context¶

Evidence directly relevant to testing the claim's factual assertions.