MoonshotAI: Kimi Linear 48B A3B Instruct

Text input Text output
Author's Description

Kimi Linear is a hybrid linear attention architecture that outperforms traditional full attention methods across various contexts, including short, long, and reinforcement learning (RL) scaling regimes. At its core is Kimi Delta Attention (KDA)—a refined version of Gated DeltaNet that introduces a more efficient gating mechanism to optimize the use of finite-state RNN memory. Kimi Linear achieves superior performance and hardware efficiency, especially for long-context tasks. It reduces the need for large KV caches by up to 75% and boosts decoding throughput by up to 6x for contexts as long as 1M tokens.

Key Specifications
Cost
$$$
Context
1M
Parameters
48B
Released
Nov 07, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Frequency Penalty Top P Response Format Temperature Logprobs Min P Max Tokens Presence Penalty Logit Bias Seed Structured Outputs Top Logprobs
Features

This model supports the following features:

Response Format Structured Outputs
Performance Summary

MoonshotAI's Kimi Linear 48B A3B Instruct, featuring the innovative Kimi Delta Attention architecture, demonstrates strong performance in specific areas, particularly excelling in hardware efficiency for long-context tasks. The model performs among the fastest, ranking in the 77th percentile for speed, and offers competitive pricing, placing in the 64th percentile. Notably, it exhibits exceptional reliability with a 99% success rate, indicating consistent operational stability. Analysis of benchmark results reveals a mixed performance profile. Kimi Linear shows a significant strength in Email Classification, achieving 98.0% accuracy, placing it in the 65th percentile. Its core architectural benefits are evident in its ability to reduce KV cache needs by up to 75% and boost decoding throughput by up to 6x for 1M token contexts. However, the model struggles with tasks requiring deep understanding and complex reasoning, such as General Knowledge (28.0% accuracy), Ethics (46.0% accuracy), and Hallucinations (68.0% accuracy, indicating a tendency to hallucinate). While its Mathematics and Reasoning scores are moderate, Instruction Following and Coding also present areas for improvement. Overall, Kimi Linear is a highly reliable and efficient model for specific classification tasks and long-context processing, but its general cognitive abilities require further development.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.3
Completion $0.6

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Parasail
Parasail | moonshotai/kimi-linear-48b-a3b-instruct-20251029 1M $0.3 / 1M tokens $0.6 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by moonshotai