SRC06¶

Anthropic -- Constitutional AI: Harmlessness from AI Feedback

Source¶

Field	Value
Title	Constitutional AI: Harmlessness from AI Feedback
Publisher	Anthropic
Author(s)	Yuntao Bai et al.
Date	2022-12-15
URL	https://arxiv.org/abs/2212.08073
Type	Research paper

Dimension	Rationale
Reliability	Foundational paper from Anthropic. Well-cited. Introduced a method now used in production (Claude).
Relevance	Directly introduces RLAIF/CAI as an RLHF alternative.
Bias flags	Anthropic has commercial interest in CAI's success. However, the method is well-documented and the paper is transparent about limitations. Measurement concern: harmlessness evaluations rely on AI judges.

Evidence ID	Summary
SRC06-E01	CAI replaces human feedback with AI-generated feedback under a constitution