R0041/2026-03-28/Q003/SRC04
DeepSeek-R1 paper — seminal RLVR implementation demonstrating emergent reasoning.
Source
| Field |
Value |
| Title |
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning |
| Publisher |
arXiv / Nature |
| Author(s) |
DeepSeek Team |
| Date |
2025-01 |
| URL |
https://arxiv.org/abs/2501.12948 |
| Type |
Research paper |
Summary
| Dimension |
Rating |
| Reliability |
High |
| Relevance |
High |
| Bias: Missing data |
Low risk |
| Bias: Measurement |
Low risk |
| Bias: Selective reporting |
Some concerns |
| Bias: Randomization |
N/A — not an RCT |
| Bias: Protocol deviation |
N/A — not an RCT |
| Bias: COI/Funding |
Some concerns |
Rationale
| Dimension |
Rationale |
| Reliability |
Seminal RLVR paper, published in Nature. Widely cited and replicated. |
| Relevance |
The foundational implementation of RLVR for reasoning, directly relevant to understanding RLVR's capabilities and limitations. |
| Bias flags |
DeepSeek has commercial interest in demonstrating their approach's effectiveness. The paper acknowledges limitations (narrow domain focus). |
| Evidence ID |
Summary |
| SRC04-E01 |
DeepSeek-R1 RLVR implementation: rule-based rewards, math/code domains, acknowledged limitations |