Skip to content

R0040/2026-04-01/Q001/S04/R04

Research R0040 — RLHF Alternatives
Run 2026-04-01
Query Q001
Search S04
Result S04-R04

Original SPIN paper on self-play fine-tuning.

Summary

Field Value
Title Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
URL https://arxiv.org/abs/2401.01335
Date accessed 2026-04-01
Publication date 2024-01-02 (ICML 2024)
Author(s) Zixiang Chen et al. (UCLA)
Publication ICML 2024

Selection Decision

Included in evidence base: Yes

Rationale: Original peer-reviewed paper introducing SPIN. Demonstrates self-play as a viable alignment paradigm.