MoonshotAI: Kimi Linear 48B A3B Instruct

Text input Text output Unavailable
Author's Description

Kimi Linear is a hybrid linear attention architecture that outperforms traditional full attention methods across various contexts, including short, long, and reinforcement learning (RL) scaling regimes. At its core is Kimi Delta Attention (KDA)—a refined version of Gated DeltaNet that introduces a more efficient gating mechanism to optimize the use of finite-state RNN memory. Kimi Linear achieves superior performance and hardware efficiency, especially for long-context tasks. It reduces the need for large KV caches by up to 75% and boosts decoding throughput by up to 6x for contexts as long as 1M tokens.

Key Specifications
Cost
$$$$
Context
1M
Parameters
48B
Released
Nov 07, 2025
Ability
Supported Parameters

This model supports the following parameters:

Structured Outputs Frequency Penalty Top Logprobs Top P Seed Min P Temperature Stop Response Format Max Tokens Presence Penalty Logit Bias Logprobs
Features

This model supports the following features:

Structured Outputs Response Format
Performance Summary

MoonshotAI's Kimi Linear 48B A3B Instruct model, featuring the innovative Kimi Delta Attention architecture, demonstrates exceptional performance characteristics, particularly in long-context scenarios. The model consistently ranks among the fastest available, indicating superior processing efficiency. While specific pricing data is unavailable, suggesting potential free-tier access or unreleased commercial details, its hardware efficiency and reduced KV cache requirements (up to 75%) imply significant cost-effectiveness in operational deployment. Reliability is a standout feature, with the model achieving a perfect 100% success rate across benchmarks, ensuring consistent and evaluable responses. A key strength lies in its architectural design, which significantly boosts decoding throughput by up to 6x for contexts as long as 1M tokens, making it highly suitable for demanding long-context applications. However, a notable weakness is observed in its hallucination benchmark, where it scored 0.0% accuracy. This indicates a complete failure to acknowledge uncertainty or identify fictional concepts, suggesting a need for improvement in its ability to differentiate between factual information and fabricated content. Despite this, its overall speed, reliability, and long-context handling capabilities position Kimi Linear as a powerful tool for specific high-throughput, long-sequence tasks.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.7
Completion $0.9

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Parasail
Parasail | moonshotai/kimi-linear-48b-a3b-instruct-20251029 1M $0.7 / 1M tokens $0.9 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by moonshotai