Xiaomi: MiMo-V2-Omni

Image input Audio input Video input Text input Text output
Author's Description

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

Key Specifications
Cost
$$$$$
Context
262K
Released
Mar 18, 2026
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Tool Choice Include Reasoning Tools Response Format Temperature Max Tokens Reasoning Presence Penalty Stop Top P Frequency Penalty
Features

This model supports the following features:

Tools Reasoning Response Format
Performance Summary

Xiaomi's MiMo-V2-Omni demonstrates moderate speed performance, ranking in the 35th percentile across benchmarks. Its pricing tends to be at premium levels, positioned in the 14th percentile. A standout feature is its exceptional reliability, achieving a 100% success rate across all evaluated benchmarks, indicating consistent and dependable operation. The model exhibits strong capabilities across several critical areas. It achieves perfect accuracy in both General Knowledge and Ethics, with the former also being the most accurate model at its price point and speed. Its Mathematics performance is particularly impressive, scoring 97.0% accuracy and ranking in the 97th percentile. Reasoning also shows high proficiency at 98.0% accuracy, placing it in the 90th percentile. While its Hallucinations score of 94.0% is respectable, it falls in the 47th percentile, suggesting some room for improvement in acknowledging uncertainty. Instruction Following and Coding are solid at 69.0% and 92.0% respectively, both ranking in the 73rd percentile. Email Classification is competent at 98.0% accuracy. Overall, MiMo-V2-Omni is a robust omni-modal model with significant strengths in complex reasoning, mathematical problem-solving, and ethical considerations, underpinned by its high reliability. Its primary areas for potential enhancement lie in reducing hallucinations and optimizing its premium cost structure.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.4
Completion $2
Input Cache Read $0.08

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Xiaomi
Xiaomi | xiaomi/mimo-v2-omni-20260318 262K $0.4 / 1M tokens $2 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by xiaomi