SRC03 — Constitutional AI: Harmlessness from AI Feedback¶
Source¶
| Title | Constitutional AI: Harmlessness from AI Feedback |
| Publisher | arXiv / Anthropic |
| Authors | Yuntao Bai, et al. (50+ co-authors including Dario Amodei, Tom Brown) |
| Date | December 2022 |
| URL | https://arxiv.org/abs/2212.08073 |
| Type | Pre-print (Anthropic research report) |
Summary Ratings¶
| Dimension | Rating |
|---|---|
| Reliability | Medium-High |
| Relevance | High |
| Missing data bias | Medium |
| Measurement bias | Low |
| Selective reporting bias | Medium |
| Randomization bias | N/A |
| Protocol deviation bias | Low |
| COI / Funding bias | High |
Rationale¶
| Dimension | Rationale |
|---|---|
| Reliability | Large author team, substantial experimental work, but not peer-reviewed in traditional sense |
| Relevance | Introduces Constitutional AI (CAI) and RLAIF as direct RLHF alternatives |
| COI / Funding | Anthropic is both the researcher and the commercial entity deploying CAI in Claude products |
| Selective reporting | Comparison primarily against Anthropic's own earlier RLHF models |
Evidence Extracts¶
| Evidence | Summary |
|---|---|
| SRC03-E01 | Constitutional AI replaces human feedback with principle-based AI self-critique |