Author's Description
Kimi Linear is a hybrid linear attention architecture that outperforms traditional full attention methods across various contexts, including short, long, and reinforcement learning (RL) scaling regimes. At its core is Kimi Delta Attention (KDA)—a refined version of Gated DeltaNet that introduces a more efficient gating mechanism to optimize the use of finite-state RNN memory. Kimi Linear achieves superior performance and hardware efficiency, especially for long-context tasks. It reduces the need for large KV caches by up to 75% and boosts decoding throughput by up to 6x for contexts as long as 1M tokens.
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
MoonshotAI's Kimi Linear 48B A3B Instruct model, featuring the innovative Kimi Delta Attention architecture, demonstrates exceptional performance characteristics, particularly in long-context scenarios. The model consistently ranks among the fastest available, indicating superior processing efficiency. While specific pricing data is unavailable, suggesting potential free-tier access or unreleased commercial details, its hardware efficiency and reduced KV cache requirements (up to 75%) imply significant cost-effectiveness in operational deployment. Reliability is a standout feature, with the model achieving a perfect 100% success rate across benchmarks, ensuring consistent and evaluable responses. A key strength lies in its architectural design, which significantly boosts decoding throughput by up to 6x for contexts as long as 1M tokens, making it highly suitable for demanding long-context applications. However, a notable weakness is observed in its hallucination benchmark, where it scored 0.0% accuracy. This indicates a complete failure to acknowledge uncertainty or identify fictional concepts, suggesting a need for improvement in its ability to differentiate between factual information and fabricated content. Despite this, its overall speed, reliability, and long-context handling capabilities position Kimi Linear as a powerful tool for specific high-throughput, long-sequence tasks.
Model Pricing
Current Pricing
| Feature | Price (per 1M tokens) |
|---|---|
| Prompt | $0.7 |
| Completion | $0.9 |
Price History
Available Endpoints
| Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
|---|---|---|---|---|
|
Parasail
|
Parasail | moonshotai/kimi-linear-48b-a3b-instruct-20251029 | 1M | $0.7 / 1M tokens | $0.9 / 1M tokens |
Benchmark Results
| Benchmark | Category | Reasoning | Strategy | Free | Executions | Accuracy | Cost | Duration |
|---|
Other Models by moonshotai
|
|
Released | Params | Context |
|
Speed | Ability | Cost |
|---|---|---|---|---|---|---|---|
| MoonshotAI: Kimi Linear 48B A3B Instruct Unavailable | Nov 07, 2025 | 48B | 1M |
Text input
Text output
|
★★★★ | ★★ | $$$ |
| MoonshotAI: Kimi K2 Thinking | Nov 06, 2025 | ~1T | 262K |
Text input
Text output
|
★ | ★★★★★ | $$$$$ |
| MoonshotAI: Kimi K2 0905 | Sep 04, 2025 | ~32B | 262K |
Text input
Text output
|
★★ | ★★★ | $$$$ |
| MoonshotAI: Kimi K2 0905 (exacto) | Sep 04, 2025 | ~1T | 262K |
Text input
Text output
|
— | — | $$$$$ |
| MoonshotAI: Kimi K2 0711 | Jul 11, 2025 | ~1T | 131K |
Text input
Text output
|
★★★★ | ★★★★★ | $$ |
| MoonshotAI: Kimi Dev 72B | Jun 16, 2025 | 72B | 131K |
Text input
Text output
|
★ | ★★ | $$$$ |
| MoonshotAI: Kimi VL A3B Thinking Unavailable | Apr 10, 2025 | 3B | 131K |
Text input
Image input
Text output
|
★ | ★ | $$$ |