Skip to content

R0040/2026-04-01/Q001/SRC06

Research R0040 — RLHF Alternatives
Run 2026-04-01
Query Q001
Search S01
Result S01-R04
Source SRC06

Anthropic -- Constitutional AI: Harmlessness from AI Feedback

Source

Field Value
Title Constitutional AI: Harmlessness from AI Feedback
Publisher Anthropic
Author(s) Yuntao Bai et al.
Date 2022-12-15
URL https://arxiv.org/abs/2212.08073
Type Research paper

Summary

Dimension Rating
Reliability High
Relevance High
Bias: Missing data Low risk
Bias: Measurement Some concerns
Bias: Selective reporting Low risk
Bias: Randomization N/A -- not an RCT
Bias: Protocol deviation N/A -- not an RCT
Bias: COI/Funding Some concerns

Rationale

Dimension Rationale
Reliability Foundational paper from Anthropic. Well-cited. Introduced a method now used in production (Claude).
Relevance Directly introduces RLAIF/CAI as an RLHF alternative.
Bias flags Anthropic has commercial interest in CAI's success. However, the method is well-documented and the paper is transparent about limitations. Measurement concern: harmlessness evaluations rely on AI judges.

Evidence Extracts

Evidence ID Summary
SRC06-E01 CAI replaces human feedback with AI-generated feedback under a constitution