SRC06 — DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models¶

Source¶


Title	DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Publisher	arXiv
Authors	Zhihong Shao, Peiyi Wang, Qihao Zhu, et al. (DeepSeek AI)
Date	February 2024 (revised April 2024)
URL	https://arxiv.org/abs/2402.03300
Type	Pre-print

Dimension	Rationale
Reliability	Strong experimental results on established benchmarks, but not peer-reviewed
Relevance	Introduces GRPO, which became the dominant RL optimizer for open-source reasoning models
COI / Funding	DeepSeek AI has commercial interest in promoting its training methods

Evidence	Summary
SRC06-E01	GRPO eliminates the critic model, halving RLHF compute requirements