Research	R0040 — RLHF Alternatives
Run	2026-03-29
Query	Q001 — RLHF Alternatives
Source	SRC03
Evidence	SRC03-E01

SRC03-E01 — Constitutional AI Replaces Human Feedback with Principles¶

Extract¶

Constitutional AI trains "a harmless AI assistant through self-improvement without human labels for harmful outputs." The approach uses "a list of rules or principles" (a "constitution") and involves two phases: supervised learning with AI self-critique and revision, followed by "RL from AI Feedback" (RLAIF) where "an AI model evaluates response quality" instead of human annotators. The method "creates more harmless models with minimal impact on helpfulness."

Relevance to Hypotheses¶

Hypothesis	Relationship	Strength
H1	Strongly supports — CAI is a deployed alternative to RLHF	Strong
H2	Contradicts — CAI is in production at Anthropic	Strong
H3	Supports — CAI partly replaces and partly augments RLHF	Moderate

Context¶

Constitutional AI is notable for being the first major alternative that changed the feedback source (from human to AI) rather than just the optimization algorithm. It is deployed in production in Anthropic's Claude models.

Notes¶

CAI still uses an RL training loop; the innovation is in the feedback mechanism rather than eliminating RL entirely. In this sense it is more accurately RLAIF than a full RLHF replacement.