Xiaomi: MiMo-V2.5

Image input Audio input Video input Text input Text output
Author's Description

MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding...

Key Specifications
Context
1M
Released
Apr 22, 2026
Supported Parameters

This model supports the following parameters:

Tool Choice Include Reasoning Tools Response Format Temperature Max Tokens Reasoning Presence Penalty Stop Top P Frequency Penalty
Features

This model supports the following features:

Tools Reasoning Response Format
Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.4
Completion $2
Input Cache Read $0.08

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Xiaomi
Xiaomi | xiaomi/mimo-v2.5-20260422 1M $0.4 / 1M tokens $2 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by xiaomi