Is AI making you dizzy? Many industry insiders share the sentiment. A few days ago, R1 emerged suddenly, alongside o1 and o3, yet with no sign of o2. This development has left many bewildered. This article serves as a guide to recent AI advancements, targeting those who feel overwhelmed by the rapid pace of change.
Timeline of Recent Developments
In the past few months, significant milestones have been reached:
- Sept 12, ‘24: o1-preview launched
- Dec 5, ‘24: Full version of o1 launched, along with o1-pro
- Dec 20, ‘24: o3 announced and recognized as “AGI” after saturating ARC-AGI
- Dec 26, ‘24: DeepSeek V3 launched
- Jan 20, ‘25: DeepSeek R1 launched, matching o1 but open source
- Jan 25, ‘25: Hong Kong University replicates R1 results
- Jan 25, ‘25: Huggingface announces open-r1 to replicate R1, fully open source
Understanding the Models
- o1, o3, and R1 are reasoning models
- DeepSeek V3 is an LLM, a base model from which reasoning models are fine-tuned
- ARC-AGI is a benchmark posing simple challenges for humans but complex tasks for AI
Francois Chollet explains that ARC-AGI-1 was a basic assessment of fluid intelligence, designed to test adaptability and problem-solving in unfamiliar situations.
Reasoning & Agents
Reasoning models differ from agents. While reasoning models can “think” before responding by generating tokens, AI agents are defined by their autonomy and ability to interact with the outside world. Agents use software and sometimes hardware to make decisions and engage with the environment.
Importance of Reasoning
Reasoning is critical as it enables task planning, supervision, and validation. Although reasoning models rely on agents, reasoning itself is a bottleneck. Once reasoning benchmarks are saturated, new challenges are likely to emerge.
Cost-Effective Reasoning
Agents operate autonomously, often around the clock, leading to significant costs. R1 is notable for being approximately 30 times cheaper than o1 while achieving similar performance.
The Significance of R1
R1’s affordability, open-source nature, and validation of OpenAI’s efforts with o1 and o3 have been pivotal. Predictions about o1’s working mechanism have been confirmed by R1’s public paper, offering insights into the scaling of subsequent models.
R1’s open-source status fosters rapid innovation, exemplified by quick replications and iterations on its model.
AI Trajectory
The AI landscape is evolving rapidly:
Beyond Pretraining Scaling
Initial scaling laws, which favored increased data and compute for better models, are less relevant due to data access challenges and new scaling laws.
Inference Time Scaling Laws
Models like o1 and R1 improve performance with longer thinking. The challenge lies in efficient computation, where Chain of Thought (CoT) emerges as the best method – a straightforward single-line thought process optimized using reinforcement learning.
Model Downsizing
Recent trends favor smaller, faster models for reasoning, as exemplified by developments in GPT-4-turbo and other LLMs.
Reinforcement Learning
R1 utilizes GRPO (Group Rewards Policy Optimization) for CoT at inference, showcasing simple reinforcement learning. Variants like R1-Zero further refine the approach, albeit with language-switching challenges.
Testing with various methodologies like GRPO, PPO, and PRIME demonstrates that the effectiveness of reinforcement learning scales with model size, particularly beyond 1.5B parameters.
Model Distillation
R1 employs distillation, using previous checkpoints to enhance its model through Supervised Fine Tuning (SFT) and reinforcement learning. This iterative process resembles speculated techniques from major AI players like OpenAI.
Predictions for 2025
Considering current developments, AI is