R0040/2026-03-28/Q001/S03/R01¶
Original DeepSeekMath paper introducing GRPO.
Summary¶
| Field | Value |
|---|---|
| Title | DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models |
| URL | https://arxiv.org/abs/2402.03300 |
| Date accessed | 2026-03-28 |
| Publication date | 2024-02-05 |
| Author(s) | Zhihong Shao et al. (DeepSeek) |
| Publication | arXiv |
Selection Decision¶
Included in evidence base: Yes
Rationale: Primary source for GRPO. Subsequently adopted in DeepSeek-R1, one of the most prominent reasoning models. Demonstrates approximately 50% compute reduction vs PPO.