Kimi K1.5: Scaling Reinforcement Learning with LLMs

Kimi k1.5: Scaling Reinforcement Learning with LLMs

The Kimi Team has unveiled Kimi k1.5, a cutting-edge o1-level multi-modal model that sets new benchmarks in AI performance. This innovative model significantly surpasses current leading models such as GPT-4o and Claude Sonnet 3.5 in various realms including AIME, MATH-500, and LiveCodeBench by up to +550%.

Overview

Kimi k1.5 has marked a transformative step in the field of artificial intelligence, leveraging reinforcement learning (RL) to expand the horizons of training data through effective exploration with rewards. Unlike previous models, Kimi k1.5 showcases competitive results, integrating RL training with advanced techniques and optimized infrastructure.

Abstract

Traditional language model training, reliant on next token prediction, has its limits in terms of available data. In contrast, Kimi k1.5 taps into the potential of reinforcement learning for scaling AI proficiency. This model excels across multiple benchmarks with scores such as 77.5 on AIME, 96.2 on MATH 500, and reaching the 94th percentile on Codeforces. Furthermore, innovative long2short methods have been employed, enhancing short-CoT models to achieve scores like 60.8 on AIME and 47.3 on LiveCodeBench.

Key Ingredients of Kimi k1.5

The Kimi k1.5 model stands as a testament to advanced RL frameworks:

Long Context Scaling

This involves extending the RL context window to 128k, thereby optimizing performance through the reuse of trajectory data, reducing the need for costly regeneration.

Improved Policy Optimization

Kimi k1.5 utilizes a refined version of online mirror descent, coupled with strategic sampling, length penalties, and data optimization, ensuring robust policy efficiency.

Simplistic Framework

The model succeeds without relying on complex methods like Monte Carlo tree search, enabling sophisticated planning and correction through increased context lengths.

Multimodal Capabilities

The integration of text and vision data allows for comprehensive reasoning, enriching the model’s applicability across various tasks.

Access and Testing

Users eager to explore the capabilities of Kimi k1.5 can request testing access via the Kimi OpenPlatform. Prospective users must fill out a test application for a test account.


from openai import Client

client = Client(
    api_key="YOUR_KIMI_KEY",
    base_url="https://api.moonshot.ai/v1",
)

messages = [
    {
        "role": "user",
        "content": "The lengths of the two legs of a right triangle are 3 cm and 4 cm respectively. Find the length of the hypotenuse of this right triangle.",
    },
]

stream = client.chat.completions.create(
    model="kimi-k1.5-preview",
    messages=messages,
    temperature=0.3,
    stream=True,
    max_tokens=8192,
)

for chunk in stream:
    if chunk.choices[0].delta:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")

The official rollout of Kimi k1.5 on the Kimi platform is anticipated shortly, promising enhanced AI-driven solutions for a myriad of applications. Stay tuned for more updates and access.