DeepSeek-R1: Advancements in Reasoning Models
DeepSeek-R1 represents a breakthrough in reasoning models, developed through advanced reinforcement learning techniques. Starting with DeepSeek-R1-Zero, trained purely via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT), the model demonstrated impressive reasoning capabilities but faced challenges like endless repetition. DeepSeek-R1 incorporates cold-start data before RL, achieving performance comparable to OpenAI-o1 across various tasks. Remarkably, DeepSeek-R1-Zero and DeepSeek-R1, along with six dense models distilled from DeepSeek-R1, have been open-sourced to support the research community.
Model Summary
Large-Scale Reinforcement Learning
DeepSeek-R1-Zero emerged by applying RL on the base model without SFT, exploring chain-of-thought (CoT) techniques, marking a significant milestone in validating the reasoning capabilities of large language models (LLMs) using RL alone. The developed pipeline for DeepSeek-R1 includes two RL stages and two SFT stages, aligning model performance with human preferences and enhancing reasoning and non-reasoning capabilities.
Distillation: Compact but Powerful
DeepSeek-R1 demonstrates that larger models’ reasoning patterns can be distilled into smaller, more efficient versions. This process results in smaller models outperforming those developed through RL on smaller models, offering a robust resource for future research and development. The distillation process led to several fine-tuned dense models, achieving impressive benchmark performances, documented in the open-source release of models ranging from 1.5B to 70B parameters.
Model Downloads
- DeepSeek-R1-Zero: 671B total parameters, 37B activated
- DeepSeek-R1: 671B total parameters, 37B activated
- DeepSeek-R1-Distill-Qwen-32B: Derived from Qwen2.5, outperforming OpenAI-o1-mini
Additional distill models are available through Hugging Face, each fine-tuned with samples generated by DeepSeek-R1 and adjusted for better performance using slightly modified configurations.
Evaluation Results
In various categories, DeepSeek-R1 models showcased exceptional performance. For English benchmarks like MMLU and MATH-500, the models achieved high pass rates and superior scores in logic and reasoning tasks. In coding challenges, such as LiveCodeBench, the models excelled in problem-solving metrics. Chinese tasks also benefited from the superior reasoning of the DeepSeek-R1 series, setting new standards in benchmarks such as CLUEWSC and C-Eval.
Running and Usage
DeepSeek-R1 models can be accessed via DeepSeek’s platform or run locally using open-source guidelines. Recommended configurations include specific temperature settings and prompt structures to maximize model efficiency and accuracy in generating responses.
Licensing and Contact
DeepSeek-R1 series is available under the MIT License, supporting commercial use and modifications. For further inquiries, contact service@deepseek.com.