R0040/2026-03-28/Q001/SRC03
Original Anthropic paper on Constitutional AI, the foundation of RLAIF.
Source
| Field |
Value |
| Title |
Constitutional AI: Harmlessness from AI Feedback |
| Publisher |
Anthropic |
| Author(s) |
Yuntao Bai et al. |
| Date |
2022-12-15 |
| URL |
https://arxiv.org/abs/2212.08073 |
| Type |
Research paper |
Summary
| Dimension |
Rating |
| Reliability |
High |
| Relevance |
High |
| Bias: Missing data |
Low risk |
| Bias: Measurement |
Some concerns |
| Bias: Selective reporting |
Some concerns |
| Bias: Randomization |
N/A |
| Bias: Protocol deviation |
N/A |
| Bias: COI/Funding |
Some concerns |
Rationale
| Dimension |
Rationale |
| Reliability |
From Anthropic, a leading AI safety lab. The method has been validated through years of production deployment in Claude. |
| Relevance |
Directly defines the first large-scale RLHF alternative. Constitutional AI kickstarted the RLAIF field. |
| Bias flags |
COI: Anthropic developed CAI and uses it in their commercial product. Self-evaluation metrics may favor their approach. Measurement: constitutional principles are somewhat arbitrary and their effectiveness is hard to measure objectively. |
| Evidence ID |
Summary |
| SRC03-E01 |
CAI replaces human feedback with AI-generated feedback guided by constitutional principles |