TinyZero: A Breakthrough in Machine Learning
TinyZero is emerging as a remarkable reproduction of DeepSeek R1 Zero. By leveraging veRL, TinyZero flaunts a 3B base language model (LM) that autonomously develops self-verification and search capabilities. All of this cutting-edge technology is accessible for less than $30, offering a hands-on experience to users eager to explore its full potential.
For more insights, you can check the Twitter thread or dive into the full experiment log.
Installation and Setup
To install and begin experimenting with TinyZero, follow these steps:
Installation:
- Create a new Conda environment:
conda create -n zero python=3.9 - (Optional) Install Torch:
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121 - Install vllm:
pip3 install vllm==0.6.3 - Install additional required packages:
pip3 install ray,pip install -e .,pip3 install flash-attn --no-build-isolation,pip install wandb IPython matplotlib
Training and Experimentation
Initiate the training by following the countdown task steps:
Data Preparation:
python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}
Training for Single GPU Models: Suitable for models <= 1.5B, like Qwen2.5-0.5B base, though this configuration might struggle with reasoning tasks.
export N_GPUS=1
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=1
export EXPERIMENT_NAME=countdown-qwen2.5-0.5b
export VLLM_ATTENTION_BACKEND=XFORMERS
bash ./scripts/train_tiny_zero.sh
Advanced 3B+ Models: These models are capable of developing advanced reasoning skills.
export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b
export VLLM_ATTENTION_BACKEND=XFORMERS
bash ./scripts/train_tiny_zero.sh
Instruct Ablation
For experimentation with QWen-2.5-3B Instruct, data needs to be reprocessed using a chat template:
Data Preparation:
python examples/data_preprocess/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}
Training:
export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct
export VLLM_ATTENTION_BACKEND=XFORMERS
bash ./scripts/train_tiny_zero.sh
Acknowledgments
These experiments were conducted based on veRL, utilizing the Qwen2.5 series base model Qwen2.5.
Citation
For academic references, please cite TinyZero as follows:
@misc{tinyzero, author = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan}, title = {TinyZero}, howpublished = {https://github.com/Jiayi-Pan/TinyZero}, note = {Accessed: