Xiaomi stuns with new MiMo-V2-Pro LLM nearing GPT-5.2, Opus 4.6 performance at a fraction of the cost

Chinese electronics and car manufacturer Xiaomi surprised the global AI community today with the release of MiMo-V2-Pro, a new 1-trillion parameter foundation model with benchmarks approaching those of U.S. AI giants OpenAI and Anthropic, but at around a seventh or sixth the cost when accessed over proprietary API — and importantly, sending less than 256,000 tokens-worth of information back and forth.

Led by Fuli Luo, a veteran of the disruptive DeepSeek R1 project, the release represents what Luo characterizes as a "quiet ambush" on the global frontier. Furthermore, Luo stated in an X post that the company does plan to open source a model variant from this latest release, " when the models are stable enough to deserve it."

By focusing on the "action space" of intelligence—moving from code generation to the autonomous operation of digital "claws"—Xiaomi is attempting to leapfrog the conversational paradigm entirely.

Prior to this foray into frontier AI, Beijing-based Xiaomi established itself as a titan of "The Internet of Things" and consumer hardware.

Globally recognized as the world’s third-largest smartphone manufacturer, Xiaomi spent the early 2020s executing a high-stakes entry into the automotive sector. Its electric vehicles (EVs), such as the SU7 and the recently launched YU7 SUV, have turned the company into a vertically integrated powerhouse capable of merging hardware, software, and now, advanced reasoning.

This pedigree in physical-world engineering informs MiMo-V2-Pro’s architecture; it is built to be the "brain" of complex systems, whether those systems are managing global supply chains or navigating the intricate scaffolds of an autonomous coding agent.

Technology: The architecture of agency

The central challenge of the "Agent Era" is maintaining high-fidelity reasoning over massive spans of data without incurring a prohibitive "intelligence tax" in latency or cost. MiMo-V2-Pro addresses this through a sparse architecture: while it houses 1T total parameters, only 42B are active during any single forward pass, making it roughly three times the size of its predecessor, MiMo-V2-Flash.

The model’s efficiency is rooted in an evolved Hybrid Attention mechanism. Standard transformers typically face a quadratic increase in compute requirements as context grows; MiMo-V2-Pro utilizes a 7:1 hybrid ratio (increased from 5:1 in the Flash version) to manage its massive 1M-token context window. This architectural choice allows the model to maintain a deep "memory" of long-running tasks without the performance degradation usually seen in frontier models.

The analogy: Think of the model not as a student reading a book page-by-page, but as an expert researcher in a vast library. The 7:1 ratio allows the model to "skim" 85% of the data for context while applying high-density attention to the 15% most relevant to the task at hand.

This is paired with a lightweight Multi-Token Prediction (MTP) layer, which allows the model to anticipate and generate multiple tokens simultaneously, drastically reducing the latency required for the "thinking" phases of agentic workflows. According to Luo, these structural decisions were made months in advance, specifically to provide a "structural advantage" for the unexpected speed at which the industry shifted toward agents.

Product and benchmarking: A third-party reality check

Xiaomi’s internal data paints a picture of a model that excels in "real-world" tasks over synthetic benchmarks. On GDPval-AA, a benchmark measuring performance on agentic real-world work tasks, MiMo-V2-Pro achieved an Elo of 1426, placing it ahead of major Chinese peers like GLM-5 (1406) and Kimi K2.5 (1283).

While it still trails Western "max effort" models like Claude Sonnet 4.6 (1633) in raw Elo, it represents the highest recorded performance for a Chinese-origin model in this category.

The third-party benchmarking organization Artificial Analysis verified these claims, placing MiMo-V2-Pro at #10 on its global Intelligence Index with a score of 49. This places it in the same tier as GPT-5.2 Codex and ahead of Grok 4.20 Beta. These results suggest that Xiaomi has successfully built a model capable of the high-level reasoning required for engineering and production tasks.

Key metrics from Artificial Analysis highlight a significant leap over the previous open-weights version, MiMo-V2-Flash (which scored 41):

Hallucination rate: The Pro model reduced hallucination rates to 30%, a sharp improvement over the Flash model’s 48%.
Omniscience index: It scored a +5, placing it ahead of GLM-5 (+2) and Kimi K2.5 (-8).
Token efficiency: To run the entire Intelligence Index, MiMo-V2-Pro required only 77M output tokens, significantly less than GLM-5 (109M) or Kimi K2.5 (89M), indicating a more concise and efficient reasoning process.

Xiaomi’s own charts further emphasize its "General Agent" and "Coding Agent" capabilities. On ClawEval, a benchmark for agentic scaffolds, the model scored 61.5, approaching the performance of Claude Opus 4.6 (66.3) and significantly outpacing GPT-5.2 (50.0). In coding-specific environments like Terminal-Bench 2.0, it achieved an 86.7, suggesting high reliability when executing commands in a live terminal environment.

How enterprises should evaluate MiMo-V2-Pro for usage

For the personas outlined in contemporary AI organizations—from Infrastructure to Security—MiMo-V2-Pro represents a paradigm shift in the "Price-Quality" curve.

Infrastructure decision-makers will find MiMo-V2-Pro a compelling candidate for the Pareto frontier of intelligence vs. cost. Artificial Analysis reported that running their index cost only $348 for MiMo-V2-Pro, compared to $2,304 for GPT-5.2 and $2,486 for Claude Opus 4.6.

For organizations managing GPU clusters or procurement, the ability to access top-10 global intelligence at roughly 1/7th the cost of Western incumbents is a powerful incentive for production-scale testing.

Data decision-makers can leverage the 1M context window for RAG-ready architectures, allowing them to feed entire enterprise codebases or documentation sets into a single prompt without the fragmentation required by smaller context models.

A systems/orchestration decision-maker should evaluate MiMo-V2-Pro as a primary "brain" for multi-agent coordination. Because the model is optimized for OpenClaw and Claude Code, it can handle long-horizon planning and precise tool use without the constant human intervention that plagues earlier models.

Its high ranking in GDPval-AA suggests it is particularly well-suited for the workflow and orchestration layer needed to scale AI across the enterprise. It allows for the creation of systems that can move beyond simple automation into complex, multi-step problem solving.

However, security decision-makers must exercise caution. The very "agentic" nature that makes the model powerful—its ability to use terminals and manipulate files—increases the surface area for prompt injection and unauthorized model access.

While its low hallucination rate (30%) is a defensive boon, the lack of public weights (unlike the Flash version) means internal security teams cannot perform the deep "model-level" audits sometimes required for highly sensitive deployments. Any enterprise implementation must be accompanied by robust monitoring and auditability protocols.

Pricing, availability, and the path forward

Xiaomi has priced MiMo-V2-Pro to dominate the developer market. The pricing is tiered based on context usage, with competitive rates for caching to support high-frequency reasoning tasks.

MiMo-V2-Pro (up to 256K): $1 per 1M input tokens and $3 per 1M output tokens
MiMo-V2-Pro (256K-1M): $2 per 1M input tokens and $6 per 1M output tokens
Cache read: $0.20 per 1M tokens for the lower tier and $0.40 for the higher tier
Cache write: Temporarily free ($0)

Here's how it stacks up to other leading frontier models around the world:

Model

Input

Output

Total Cost

Source

Grok 4.1 Fast

$0.20

$0.50

$0.70

xAI

MiniMax M2.7

$0.30

$1.20

$1.50

MiniMax

Gemini 3 Flash

$0.50

$3.00

$3.50

Google

Kimi-K2.5

$0.60

$3.00

$3.60

Moonshot

MiMo-V2-Pro (≤256K)

$1.00

$3.00

$4.00

Xiaomi MiMo

GLM-5-Turbo

$0.96

$3.20

$4.16

OpenRouter

GLM-5

$1.00

$3.20

$4.20

Z.ai

Claude Haiku 4.5

$1.00

$5.00

$6.00

Anthropic

Qwen3-Max

$1.20

$6.00

$7.20

Alibaba Cloud

Gemini 3 Pro

$2.00

$12.00

$14.00

Google

GPT-5.2

$1.75

$14.00

$15.75

OpenAI

GPT-5.4

$2.50

$15.00

$17.50

OpenAI

Claude Sonnet 4.5

$3.00

$15.00

$18.00

Anthropic

Claude Opus 4.6

$5.00

$25.00

$30.00

Anthropic

GPT-5.4 Pro

$30.00

$180.00

$210.00

OpenAI

This aggressive positioning is designed to encourage the high-intensity application flows that define the next generation of software. The model is currently available via Xiaomi’s first-party API only, with no current support for image or multimodal input—a notable omission in an era of "Omni" models, though Xiaomi has teased a separate MiMo-V2-Omni for those needs.

The "Hunter Alpha" period on OpenRouter proved that the market has a high appetite for this specific blend of efficiency and reasoning. Fuli Luo’s philosophy—that research velocity is fueled by a "genuine love for the world you're building for"—has resulted in a model that ranks 2nd in China and 8th worldwide on established intelligence indices.

Whether it remains a "quiet" ambush or becomes the foundation for a global realignment of AI power depends on how quickly developers adopt the "action space" over the "chat window". For now, Xiaomi has moved the goalposts: the question is no longer just "can it talk?" but "can it act?"

Source link