Image input
Audio input
Video input
Text input
Text output
Author's Description
MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding...
Key Specifications
Context
1M
Released
Apr 22, 2026
Supported Parameters
This model supports the following parameters:
Tool Choice
Include Reasoning
Tools
Response Format
Temperature
Max Tokens
Reasoning
Presence Penalty
Stop
Top P
Frequency Penalty
Features
This model supports the following features:
Tools
Reasoning
Response Format
Model Pricing
Current Pricing
| Feature | Price (per 1M tokens) |
|---|---|
| Prompt | $0.4 |
| Completion | $2 |
| Input Cache Read | $0.08 |
Price History
Available Endpoints
| Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
|---|---|---|---|---|
|
Xiaomi
|
Xiaomi | xiaomi/mimo-v2.5-20260422 | 1M | $0.4 / 1M tokens | $2 / 1M tokens |
Benchmark Results
| Benchmark | Category | Reasoning | Strategy | Free | Executions | Accuracy | Cost | Duration |
|---|
Other Models by xiaomi
|
|
Released | Params | Context |
|
Speed | Ability | Cost |
|---|---|---|---|---|---|---|---|
| Xiaomi: MiMo-V2.5-Pro Unavailable | Apr 22, 2026 | — | 1M |
Text input
Text output
|
★★ | ★★★★★ | $$$$$ |
| Xiaomi: MiMo-V2.5-Pro | Apr 22, 2026 | — | 1M |
Text input
Text output
|
— | — | — |
| Xiaomi: MiMo-V2.5 Unavailable | Apr 22, 2026 | — | 1M |
Image input
Audio input
Video input
Text input
Text output
|
★★★ | ★★★★★ | $$$$$ |
| Xiaomi: MiMo-V2-Omni | Mar 18, 2026 | — | 262K |
Image input
Audio input
Video input
Text input
Text output
|
★★ | ★★★★★ | $$$$$ |
| Xiaomi: MiMo-V2-Pro | Mar 18, 2026 | ~1T | 1M |
Text input
Text output
|
★ | ★★★★★ | $$$$$ |
| Xiaomi: MiMo-V2-Flash | Dec 14, 2025 | ~309B | 262K |
Text input
Text output
|
★★★★ | ★★★ | $$ |