Skip to content

SRC06 — DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Source

Title DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Publisher arXiv
Authors Zhihong Shao, Peiyi Wang, Qihao Zhu, et al. (DeepSeek AI)
Date February 2024 (revised April 2024)
URL https://arxiv.org/abs/2402.03300
Type Pre-print

Summary Ratings

Dimension Rating
Reliability Medium-High
Relevance High
Missing data bias Low
Measurement bias Low
Selective reporting bias Medium
Randomization bias N/A
Protocol deviation bias Low
COI / Funding bias Medium

Rationale

Dimension Rationale
Reliability Strong experimental results on established benchmarks, but not peer-reviewed
Relevance Introduces GRPO, which became the dominant RL optimizer for open-source reasoning models
COI / Funding DeepSeek AI has commercial interest in promoting its training methods

Evidence Extracts

Evidence Summary
SRC06-E01 GRPO eliminates the critic model, halving RLHF compute requirements