Xiaomi: MiMo-V2-Omni

Audio input Text input Image input Video input Text output
Author's Description

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

Key Specifications
Cost
$$$$$
Context
262K
Released
Mar 18, 2026
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Tools Tool Choice Temperature Include Reasoning Reasoning Presence Penalty Max Tokens Response Format Frequency Penalty Top P Stop
Features

This model supports the following features:

Reasoning Response Format Tools
Performance Summary

Xiaomi's MiMo-V2-Omni demonstrates moderate speed performance, ranking in the 35th percentile across benchmarks. Its pricing tends to be at premium levels, positioned in the 14th percentile. A standout feature is its exceptional reliability, achieving a 100% success rate across all evaluated benchmarks, indicating consistent and dependable operation. The model exhibits strong capabilities across several critical areas. It achieves perfect accuracy in both General Knowledge and Ethics, with the former also being the most accurate model at its price point and speed. Its Mathematics performance is particularly impressive, scoring 97.0% accuracy and ranking in the 97th percentile. Reasoning also shows high proficiency at 98.0% accuracy, placing it in the 90th percentile. While its Hallucinations score of 94.0% is respectable, it falls in the 47th percentile, suggesting some room for improvement in acknowledging uncertainty. Instruction Following and Coding are solid at 69.0% and 92.0% respectively, both ranking in the 73rd percentile. Email Classification is competent at 98.0% accuracy. Overall, MiMo-V2-Omni is a robust omni-modal model with significant strengths in complex reasoning, mathematical problem-solving, and ethical considerations, underpinned by its high reliability. Its primary areas for potential enhancement lie in reducing hallucinations and optimizing its premium cost structure.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.4
Completion $2
Input Cache Read $0.08

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Xiaomi
Xiaomi | xiaomi/mimo-v2-omni-20260318 262K $0.4 / 1M tokens $2 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by xiaomi