R01¶

Original GRPO paper from DeepSeek.

Summary¶

Field	Value
Title	DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
URL	https://arxiv.org/abs/2402.03300
Date accessed	2026-04-01
Publication date	2024-02-05
Author(s)	Zhihong Shao et al. (DeepSeek AI)
Publication	arXiv preprint

Included in evidence base: Yes

Rationale: Original paper introducing GRPO. Primary source for the method that became standard for reasoning model training.