Skip to content
Research R0040 — RLHF Alternatives
Run 2026-03-29
Query Q001 — RLHF Alternatives
Search S03
Result S03-R01

S03-R01 — DeepSeekMath: Pushing the Limits of Mathematical Reasoning

Summary

Title DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
URL https://arxiv.org/abs/2402.03300
Date accessed 2026-03-29
Publication date February 2024
Authors Zhihong Shao, Peiyi Wang, Qihao Zhu, et al.
Publication arXiv (DeepSeek AI)

Selection Decision

Selected as the primary paper introducing GRPO, which became the dominant RL optimizer for open reasoning models.