Xiaomi: MiMo-V2-Omni

Text input Video input Image input Audio input Text output
Author's Description

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities. 256K context window.

Key Specifications
Cost
$$$$$
Context
262K
Released
Mar 18, 2026
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Temperature Stop Frequency Penalty Response Format Include Reasoning Tool Choice Reasoning Max Tokens Top P Presence Penalty Tools
Features

This model supports the following features:

Tools Response Format Reasoning
Performance Summary

The Xiaomi MiMo-V2-Omni model demonstrates moderate speed performance, ranking in the 35th percentile across various benchmarks. Its pricing tends to be at premium levels, positioned in the 14th percentile. A standout feature is its exceptional reliability, achieving a 100% success rate across all evaluated benchmarks, indicating consistent and stable operation. In terms of performance across categories, MiMo-V2-Omni exhibits strong capabilities in several areas. It achieves perfect accuracy in both General Knowledge and Ethics, with the former also being highlighted as the most accurate model at its price point and speed. Its Reasoning and Mathematics scores are particularly impressive, ranking in the 90th and 97th percentiles respectively, showcasing advanced problem-solving and quantitative skills. Coding also shows strong performance at 92% accuracy. While its Hallucinations benchmark is respectable at 94% accuracy, it falls in the 47th percentile, suggesting some room for improvement in acknowledging uncertainty. Instruction Following, at 69% accuracy, is a relative weakness, placing it in the 73rd percentile but indicating challenges with complex multi-step directives. Email Classification is solid at 98% accuracy. Overall, the model's key strengths lie in its high accuracy in knowledge-based tasks, complex reasoning, and mathematical problem-solving, coupled with robust reliability. Its primary area for development appears to be in handling highly complex, multi-layered instructions.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.4
Completion $2
Input Cache Read $0.08

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Xiaomi
Xiaomi | xiaomi/mimo-v2-omni-20260318 262K $0.4 / 1M tokens $2 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by xiaomi