SRC04¶

DeepSeek-R1 paper — seminal RLVR implementation demonstrating emergent reasoning.

Source¶

Field	Value
Title	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Publisher	arXiv / Nature
Author(s)	DeepSeek Team
Date	2025-01
URL	https://arxiv.org/abs/2501.12948
Type	Research paper

Dimension	Rationale
Reliability	Seminal RLVR paper, published in Nature. Widely cited and replicated.
Relevance	The foundational implementation of RLVR for reasoning, directly relevant to understanding RLVR's capabilities and limitations.
Bias flags	DeepSeek has commercial interest in demonstrating their approach's effectiveness. The paper acknowledges limitations (narrow domain focus).

Evidence ID	Summary
SRC04-E01	DeepSeek-R1 RLVR implementation: rule-based rewards, math/code domains, acknowledged limitations