DeepSeek-R1: Advancements in Generative AI Models

Introduction

Introducing DeepSeek-R1-Zero and DeepSeek-R1, our first-generation reasoning models. Trained using large-scale reinforcement learning (RL) without a preliminary supervised fine-tuning (SFT) step, DeepSeek-R1-Zero showcases remarkable reasoning abilities. Despite some challenges like endless repetition, DeepSeek-R1 addresses these by incorporating cold-start data before RL, achieving performance akin to OpenAI-o1 across diverse tasks. Additionally, DeepSeek models and distillations are available to the research community, setting new benchmarks for dense models.

Model Summary

Post-Training: Large-Scale Reinforcement Learning

We applied RL directly to the base model, bypassing the need for SFT. This method facilitates the development of DeepSeek-R1-Zero, capable of self-verification, reflection, and generating extensive chains of thought (CoTs). Our research validates that pure RL suffices to enhance reasoning capabilities in large language models (LLMs).

Distillation: Smaller Models Can Be Powerful Too

Distillation allows the reasoning patterns of large models to be effectively transferred to smaller models, achieving superior performance. The open-source DeepSeek-R1 pipeline, alongside its API, assists the community in distilling efficient models using generated reasoning data.

Model Downloads

DeepSeek-R1 models, such as DeepSeek-R1-Zero and DeepSeek-R1, are built on the DeepSeek-V3-Base. For further details on the model architecture, consult the DeepSeek-V3 repository. Numerous distillations based on Qwen2.5 and Llama3 series are also available to enhance research efforts.

Evaluation Results

Performance of DeepSeek-R1 and its distilled variants have been evaluated across various benchmarks. The models demonstrate competitive results, excelling in categories like math, code, and reasoning tasks, thereby establishing new standards in dense model performance.

Chat Website & API Platform

Interact with DeepSeek-R1 on our official website, chat.deepseek.com, or utilize OpenAI-compatible API at platform.deepseek.com for a seamless experience.

How to Run Locally

For local deployment of DeepSeek-R1 models, visit the DeepSeek-V3 repository. DeepSeek-R1-Distill models are usable with vLLM or SGLang services, offering flexible configurations suitable for various research and development needs.

License

The DeepSeek-R1 series, licensed under the MIT License, supports commercial use and modifications. Derived models from Qwen-2.5 and Llama series retain their respective original licenses, now fine-tuned with a substantial dataset curated by DeepSeek-R1.

Contact

For inquiries, please contact us at service@deepseek.com.