SRC03¶

Original Anthropic paper on Constitutional AI, the foundation of RLAIF.

Source¶

Field	Value
Title	Constitutional AI: Harmlessness from AI Feedback
Publisher	Anthropic
Author(s)	Yuntao Bai et al.
Date	2022-12-15
URL	https://arxiv.org/abs/2212.08073
Type	Research paper

Dimension	Rationale
Reliability	From Anthropic, a leading AI safety lab. The method has been validated through years of production deployment in Claude.
Relevance	Directly defines the first large-scale RLHF alternative. Constitutional AI kickstarted the RLAIF field.
Bias flags	COI: Anthropic developed CAI and uses it in their commercial product. Self-evaluation metrics may favor their approach. Measurement: constitutional principles are somewhat arbitrary and their effectiveness is hard to measure objectively.

Evidence ID	Summary
SRC03-E01	CAI replaces human feedback with AI-generated feedback guided by constitutional principles