S03-R01 — DeepSeekMath: Pushing the Limits of Mathematical Reasoning¶

Summary¶


Title	DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
URL	https://arxiv.org/abs/2402.03300
Date accessed	2026-03-29
Publication date	February 2024
Authors	Zhihong Shao, Peiyi Wang, Qihao Zhu, et al.
Publication	arXiv (DeepSeek AI)

Selected as the primary paper introducing GRPO, which became the dominant RL optimizer for open reasoning models.