SRC03 — Constitutional AI: Harmlessness from AI Feedback¶

Source¶


Title	Constitutional AI: Harmlessness from AI Feedback
Publisher	arXiv / Anthropic
Authors	Yuntao Bai, et al. (50+ co-authors including Dario Amodei, Tom Brown)
Date	December 2022
URL	https://arxiv.org/abs/2212.08073
Type	Pre-print (Anthropic research report)

Dimension	Rationale
Reliability	Large author team, substantial experimental work, but not peer-reviewed in traditional sense
Relevance	Introduces Constitutional AI (CAI) and RLAIF as direct RLHF alternatives
COI / Funding	Anthropic is both the researcher and the commercial entity deploying CAI in Claude products
Selective reporting	Comparison primarily against Anthropic's own earlier RLHF models

Evidence	Summary
SRC03-E01	Constitutional AI replaces human feedback with principle-based AI self-critique