SRC06 — DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models¶
Source¶
| Title | DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models |
| Publisher | arXiv |
| Authors | Zhihong Shao, Peiyi Wang, Qihao Zhu, et al. (DeepSeek AI) |
| Date | February 2024 (revised April 2024) |
| URL | https://arxiv.org/abs/2402.03300 |
| Type | Pre-print |
Summary Ratings¶
| Dimension | Rating |
|---|---|
| Reliability | Medium-High |
| Relevance | High |
| Missing data bias | Low |
| Measurement bias | Low |
| Selective reporting bias | Medium |
| Randomization bias | N/A |
| Protocol deviation bias | Low |
| COI / Funding bias | Medium |
Rationale¶
| Dimension | Rationale |
|---|---|
| Reliability | Strong experimental results on established benchmarks, but not peer-reviewed |
| Relevance | Introduces GRPO, which became the dominant RL optimizer for open-source reasoning models |
| COI / Funding | DeepSeek AI has commercial interest in promoting its training methods |
Evidence Extracts¶
| Evidence | Summary |
|---|---|
| SRC06-E01 | GRPO eliminates the critic model, halving RLHF compute requirements |