Skip to content

R0040/2026-03-28/Q001/S03/R01

Research R0040 — RLHF Alternatives
Run 2026-03-28
Query Q001
Search S03
Result S03-R01

Original DeepSeekMath paper introducing GRPO.

Summary

Field Value
Title DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
URL https://arxiv.org/abs/2402.03300
Date accessed 2026-03-28
Publication date 2024-02-05
Author(s) Zhihong Shao et al. (DeepSeek)
Publication arXiv

Selection Decision

Included in evidence base: Yes

Rationale: Primary source for GRPO. Subsequently adopted in DeepSeek-R1, one of the most prominent reasoning models. Demonstrates approximately 50% compute reduction vs PPO.