SRC04¶

DeepSeek paper introducing GRPO for mathematical reasoning.

Source¶

Field	Value
Title	DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Publisher	DeepSeek
Author(s)	Zhihong Shao et al.
Date	2024-02-05
URL	https://arxiv.org/abs/2402.03300
Type	Research paper

Dimension	Rationale
Reliability	From a major AI lab with open-source track record. GRPO was subsequently validated through DeepSeek-R1 deployment.
Relevance	Introduces a structurally different RL alternative that eliminates the critic model.
Bias flags	COI: DeepSeek developed GRPO for their own models. Selective reporting: compute savings claim (~50%) has not been independently replicated.

Evidence ID	Summary
SRC04-E01	GRPO eliminates critic model, halves compute requirements vs PPO