S04 — Constitutional AI and RLAIF¶

Summary¶


Source / Database	Web (Google via WebSearch) + arXiv
Query terms	"constitutional AI Anthropic alternative RLHF principles self-critique"; "RLAIF reinforcement learning AI feedback replace RLHF sycophancy"
Filters	None
Results returned	20 (10 per query)
Results selected	4
Results rejected	16

Result	Title	URL	Rationale
S04-R01	Constitutional AI: Harmlessness from AI Feedback (arXiv)	https://arxiv.org/abs/2212.08073	Primary CAI paper
S04-R02	RLAIF vs. RLHF (arXiv)	https://arxiv.org/abs/2309.00267	Primary RLAIF comparison paper
S04-R03	Claude's Constitution (Anthropic)	https://www.anthropic.com/constitution	Production deployment details
S04-R04	Anthropic Soul Spec and sycophancy approach	https://www.anthropic.com/news/claudes-constitution	Practical anti-sycophancy measures

Result	Title	URL	Rationale
S04-R05	CAI PDF (duplicate)	https://arxiv.org/pdf/2212.08073	Duplicate format
S04-R06	On 'Constitutional' AI (digi-con)	https://digi-con.org/on-constitutional-ai/	Policy commentary, not technical
S04-R07	Constitution or Collapse (arXiv)	https://arxiv.org/html/2504.04918v1	Focused on Llama application, not core CAI
S04-R08-16	Various secondary sources	Various	Blog posts, tutorials, or duplicate coverage

Two searches combined covering CAI and RLAIF as related approaches that replace human feedback with AI feedback.