Author's Description
MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybrid Mixture-of-Experts (MoE) architecture paired with a custom "lightning attention" mechanism, allowing it to process long sequences—up to 1 million tokens—while maintaining competitive FLOP efficiency. With 456 billion total parameters and 45.9B active per token, this variant is optimized for complex, multi-step reasoning tasks. Trained via a custom reinforcement learning pipeline (CISPO), M1 excels in long-context understanding, software engineering, agentic tool use, and mathematical reasoning. Benchmarks show strong performance across FullStackBench, SWE-bench, MATH, GPQA, and TAU-Bench, often outperforming other open models like DeepSeek R1 and Qwen3-235B.
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
MiniMax M1 (extended) demonstrates exceptional speed, consistently ranking among the fastest models across three benchmarks, and offers highly competitive pricing, placing it among the most cost-effective options across two benchmarks. While specific reliability data is not provided, the model's overall performance can be assessed through its benchmark results. In terms of general knowledge, the model scored 0.0% accuracy, indicating a significant weakness in this area. However, it achieved a strong 99.0% accuracy in Email Classification, placing it in the 73rd percentile, with a competitive cost of $0.0717 (15th percentile). The duration for email classification was notably high at 2354404ms (2nd percentile), suggesting potential inefficiencies in processing time despite high accuracy. The Ethics benchmark also showed 0.0% accuracy, highlighting another critical area for improvement. Overall, MiniMax M1 excels in specific classification tasks and boasts impressive speed and cost-efficiency, but it exhibits significant limitations in general knowledge and ethical reasoning. Its strengths lie in its extended context capabilities, MoE architecture, and custom attention mechanism, which are geared towards complex, multi-step reasoning, software engineering, and agentic tool use, as evidenced by its strong performance on specialized benchmarks not detailed here.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $0 |
Completion | $0 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
Novita
|
Novita | minimax/minimax-m1:extended | 128K | $0 / 1M tokens | $0 / 1M tokens |
Chutes
|
Chutes | minimax/minimax-m1:extended | 512K | $0 / 1M tokens | $0 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Strategy | Free | Executions | Accuracy | Cost | Duration |
---|
Other Models by minimax
|
Released | Params | Context |
|
Speed | Ability | Cost |
---|---|---|---|---|---|---|---|
MiniMax: MiniMax M1 | Jun 17, 2025 | — | 1M |
Text input
Text output
|
★ | ★★★★ | $$$$$ |
MiniMax: MiniMax-01 | Jan 14, 2025 | ~456B | 1M |
Text input
Image input
Text output
|
★★★ | ★★ | $$$ |