Skip to content

R0040/2026-04-01/Q001/S03/R01

Research R0040 — RLHF Alternatives
Run 2026-04-01
Query Q001
Search S03
Result S03-R01

Original GRPO paper from DeepSeek.

Summary

Field Value
Title DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
URL https://arxiv.org/abs/2402.03300
Date accessed 2026-04-01
Publication date 2024-02-05
Author(s) Zhihong Shao et al. (DeepSeek AI)
Publication arXiv preprint

Selection Decision

Included in evidence base: Yes

Rationale: Original paper introducing GRPO. Primary source for the method that became standard for reasoning model training.