Skip to content

SRC03 — Constitutional AI: Harmlessness from AI Feedback

Source

Title Constitutional AI: Harmlessness from AI Feedback
Publisher arXiv / Anthropic
Authors Yuntao Bai, et al. (50+ co-authors including Dario Amodei, Tom Brown)
Date December 2022
URL https://arxiv.org/abs/2212.08073
Type Pre-print (Anthropic research report)

Summary Ratings

Dimension Rating
Reliability Medium-High
Relevance High
Missing data bias Medium
Measurement bias Low
Selective reporting bias Medium
Randomization bias N/A
Protocol deviation bias Low
COI / Funding bias High

Rationale

Dimension Rationale
Reliability Large author team, substantial experimental work, but not peer-reviewed in traditional sense
Relevance Introduces Constitutional AI (CAI) and RLAIF as direct RLHF alternatives
COI / Funding Anthropic is both the researcher and the commercial entity deploying CAI in Claude products
Selective reporting Comparison primarily against Anthropic's own earlier RLHF models

Evidence Extracts

Evidence Summary
SRC03-E01 Constitutional AI replaces human feedback with principle-based AI self-critique