R0040/2026-04-01/Q001/S03/R01¶
Original GRPO paper from DeepSeek.
Summary¶
| Field | Value |
|---|---|
| Title | DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models |
| URL | https://arxiv.org/abs/2402.03300 |
| Date accessed | 2026-04-01 |
| Publication date | 2024-02-05 |
| Author(s) | Zhihong Shao et al. (DeepSeek AI) |
| Publication | arXiv preprint |
Selection Decision¶
Included in evidence base: Yes
Rationale: Original paper introducing GRPO. Primary source for the method that became standard for reasoning model training.