DeepSeek-R1: Advancements in Generative AI Models
Introduction
Introducing DeepSeek-R1-Zero and DeepSeek-R1, our first-generation reasoning models. Trained using large-scale reinforcement learning (RL) without a preliminary supervised fine-tuning (SFT) step, DeepSeek-R1-Zero showcases remarkable reasoning abilities. Despite some challenges like endless repetition, DeepSeek-R1 addresses these by incorporating cold-start data before RL, achieving performance akin to OpenAI-o1 across diverse tasks. Additionally, DeepSeek models and distillations are available to the research community, setting new benchmarks for dense models.
Model Summary
Post-Training: Large-Scale Reinforcement Learning
We applied RL directly to the base model, bypassing the need for SFT. This method facilitates the development of DeepSeek-R1-Zero, capable of self-verification, reflection, and generating extensive chains of thought (CoTs). Our research validates that pure RL suffices to enhance reasoning capabilities in large language models (LLMs).
Distillation: Smaller Models Can Be Powerful Too
Distillation allows the reasoning patterns of large models to be effectively transferred to smaller models, achieving superior performance. The open-source DeepSeek-R1 pipeline, alongside its API, assists the community in distilling efficient models using generated reasoning data.
Model Downloads
DeepSeek-R1 models, such as DeepSeek-R1-Zero and DeepSeek-R1, are built on the DeepSeek-V3-Base. For further details on the model architecture, consult the DeepSeek-V3 repository. Numerous distillations based on Qwen2.5 and Llama3 series are also available to enhance research efforts.
Evaluation Results
Performance of DeepSeek-R1 and its distilled variants have been evaluated across various benchmarks. The models demonstrate competitive results, excelling in categories like math, code, and reasoning tasks, thereby establishing new standards in dense model performance.
Chat Website & API Platform
Interact with DeepSeek-R1 on our official website, chat.deepseek.com, or utilize OpenAI-compatible API at platform.deepseek.com for a seamless experience.
How to Run Locally
For local deployment of DeepSeek-R1 models, visit the DeepSeek-V3 repository. DeepSeek-R1-Distill models are usable with vLLM or SGLang services, offering flexible configurations suitable for various research and development needs.
License
The DeepSeek-R1 series, licensed under the MIT License, supports commercial use and modifications. Derived models from Qwen-2.5 and Llama series retain their respective original licenses, now fine-tuned with a substantial dataset curated by DeepSeek-R1.
Contact
For inquiries, please contact us at service@deepseek.com.