Xiaomi MiMo Code AI Surpasses Claude in Ultra-Long 200+ Step Coding Tasks

Xiaomi’s AI division, MiMo, has released MiMo Code V0.1.0, an open-source, terminal-native AI coding assistant. The company claims this new tool surpasses Anthropic’s Claude Code in key benchmarks, particularly for long-horizon, multi-step coding tasks involving over 200 steps, based on internal beta testing and a survey of 576 developers.

In conjunction with this release, Xiaomi is offering limited-time free access to MiMo-V2.5, its flagship multimodal model featuring a million-token context window, with no registration required.

The announcement, made on June 10, 2026, via the official @XiaomiMiMo X account, positioned the tool as “more than an AI coding assistant in your terminal — it’s the smartest coding partner you’ll ever work with.”

MiMo Code is now available on GitHub under an MIT license. Installation is straightforward via a single terminal command on macOS and Linux (curl -fsSL https://mimo.xiaomi.com/install | bash) or via npm on Windows (npm install -g @mimo-ai/cli).

This project builds upon the open-source OpenCode agent, with Xiaomi enhancing it through a proprietary memory architecture, refined workflow modes, and an integrated model harness.

Addressing the Amnesia Deficit in AI Coding Agents

A common limitation of current AI coding agents is their tendency to lose context and degrade performance over extended work sessions. As the context window fills, earlier decisions, project conventions, and task states are either compressed or lost entirely, forcing developers to repeatedly re-explain their project’s intricacies.

Xiaomi contends that this approach is inherently limited for scalable applications. Their MiMo team stated in their launch blog, “What we need is not better compression, but an explicit storage-and-retrieval mechanism that decides what information should be written into persistent structures, and when it should be recalled.”

MiMo Code tackles this challenge with a cross-session memory system that leverages SQLite FTS5 for full-text search. This system operates across four layers: project memory (stored in a persistent MEMORY.md file), session checkpoints, scratch notes, and per-task progress logs.

A key innovation is its note-taking mechanism. Instead of requiring the primary coding agent to pause and document, the system deploys an independent “checkpoint-writer” sub-agent. This can be visualized as a construction contractor (the primary agent) working alongside a dedicated architect (the checkpoint-writer). While the contractor builds, the architect continuously updates blueprints, recording decisions, issues, and project status in real-time. When the contractor gets disoriented, the architect can instantly provide the necessary context, enabling the project to resume without momentum loss. In MiMo Code’s case, the system reconstructs the operational environment from structured checkpoints, preserving continuity.

Two self-improvement mechanisms further enhance the system: a `/dream` command that periodically reviews and de-duplicates historical sessions for long-term memory storage, and a “distill” function that identifies and automates recurring workflows from past sessions, a method mirrored by advancements from OpenAI and Anthropic.

Demonstrable Performance on Software Engineering Benchmarks

Xiaomi’s technical blog post highlights benchmark results where MiMo Code, powered by MiMo-V2.5-Pro, outperformed Claude Code with Claude Sonnet 4.6 across three evaluated categories:

Xiaomi MiMo Code AI Surpasses Claude in Ultra-Long 200+ Step Coding Tasks 4

SWE-bench Verified: 82% versus 79%
SWE-bench Pro: 62% versus 55%
Terminal Bench 2: 73% versus 69%

The harness itself significantly contributes to these gains. When the MiMo-V2.5-Pro model was run in both systems, MiMo Code achieved a 62% score on SWE-bench Pro compared to Claude Code’s 57%, and 73% on Terminal Bench 2 versus 68%. This five-point difference is attributed to the agent system’s design rather than the underlying model.

Xiaomi notably omitted comparisons against OpenAI’s Codex or Google’s Gemini CLI, focusing solely on Claude Code as its competitor, a strategic benchmark choice.

External benchmarks suggest potential reasons. The official Terminal-Bench 2.0 leaderboard indicates OpenAI’s Codex CLI with GPT-5.5 scores 82.2%, exceeding MiMo Code’s self-reported 73%. OpenAI’s GPT-5.5 announcement also claims 82.7% on the same benchmark. However, on SWE-Bench Pro, the landscape shifts, with OpenAI reporting GPT-5.5 at 58.6%, falling below MiMo Code + MiMo-V2.5-Pro’s claimed 62%. (MiMo Code is not yet listed on either official leaderboard, and direct comparison of self-reported metrics against leaderboard submissions requires careful consideration of configurations.)

Perhaps more telling than offline benchmarks is Xiaomi’s internal double-blind A/B evaluation. During its beta phase, 576 developers across 474 private repositories compared MiMo Code against Claude Code. While results were split nearly 50/50 for tasks under 200 execution steps, MiMo Code’s win rate surpassed 65% for tasks exceeding 200 steps. This supports Xiaomi’s assertion that its memory and state management architecture provides a distinct advantage in long-horizon development work.

Xiaomi acknowledges that standard benchmarks primarily measure single-task problem-solving and do not fully capture the multi-session capabilities of MiMo Code. As with all vendor self-reported data, independent verification is pending, and direct comparisons of harness performance are configuration-dependent. Nevertheless, the results align with a growing industry trend: the sophistication of scaffolding and harness engineering is becoming as critical as the core model’s capabilities in achieving high performance for AI coding agents.

Seamless Integration with Developer Workflows and Voice Command Capabilities

MiMo Code is engineered for integration into existing developer environments, operating directly within the terminal to read/write files, execute commands, and manage Git repositories.

The tool offers a zero-configuration setup, automatically connecting to “MiMo Auto,” a limited-time free access channel powered by Xiaomi’s multimodal MiMo V2.5 model with its extensive million-token context window. For users transitioning from other platforms, MiMo Code facilitates a smooth migration by automatically importing configurations and server settings from Claude Code.

Key features include:

Compose Mode: This mode allows developers to initiate a development cycle by specifying a high-level goal. The system then autonomously handles the entire process – design, planning, coding, testing, and review – adhering to a “heavy planning upfront, stable verification later” methodology.
Voice Control: Leveraging Xiaomi’s MiMo-ASR speech recognition and TenVAD voice activity detection, developers can issue verbal commands for hands-free operation, including dictating instructions and executing actions like “send” or “execute.” This feature is available for logged-in users.

Xiaomi reports that the agent harness alone delivers measurable improvements. When the same underlying MiMo model was tested in both MiMo Code and Claude Code, MiMo Code scored 62% on SWE-Bench Pro (versus 57% for Claude Code) and 73% on Terminal Bench 2 (versus 68% for Claude Code), indicating a roughly five-point advantage attributed to the agent system’s architecture.

As always, these figures are vendor-reported and await independent validation. However, the claims align with a broader industry observation: the effectiveness of AI coding agents is increasingly influenced by the quality of their supporting infrastructure and harness engineering, in addition to the core model’s capabilities.

Aggressive Pricing Strategy

A significant incentive for developers may be the bundled offering. MiMo Code includes access to “MiMo Auto,” a channel providing free, limited-time access to MiMo-V2.5. This natively multimodal model, released in April 2026, utilizes a sparse mixture-of-experts (MoE) design with 310 billion total parameters (15 billion active per inference) and a million-token context window, positioning it competitively against Anthropic’s Claude Sonnet 4.6 for multimodal agentic tasks.

As previously reported, the MiMo-V2.5 family models are MIT-licensed and noted for their efficiency and affordability in agentic tasks.

The more advanced MiMo-V2.5-Pro, a 1.02 trillion-parameter MoE model with 42 billion active parameters and a hybrid-attention architecture, led open-source performance on Xiaomi’s ClawEval agentic benchmark with a 63.8% success rate. Critically, it achieved this using approximately 70,000 tokens per trajectory, a 40-60% reduction compared to models like Anthropic’s Claude Opus 4.6, Google’s Gemini 3.1 Pro, or OpenAI’s GPT-5.4 for similar results.

Notably, the V2.5-Pro’s post-training explicitly focused on “harness awareness,” enhancing its ability to manage memory and context within agent scaffolds. This makes a Xiaomi-developed harness, optimized for this capability, a logical extension.

Xiaomi’s pricing strategy is equally competitive. MiMo-V2.5 is priced starting at $0.40 per million input tokens and $2.00 per million output tokens. The V2.5-Pro model ranges from $1.00/$3.00 per million (input/output) for context windows up to 256K, with higher context lengths doubling the input cost. Cache hits can reduce input costs to as low as $0.20-$0.40 per million tokens, making it one of the most cost-effective frontier models available globally.

VentureBeat Frontier AI Model API Pricing Snapshot

Model	Input	Output	Total Cost	Source
MiMo-V2.5 Flash	$0.10	$0.30	$0.40	Xiaomi MiMo
deepseek-v4-flash	$0.14	$0.28	$0.42	DeepSeek
deepseek-v4-pro	$0.435	$0.87	$1.305	DeepSeek
MiniMax-M3	$0.30	$1.20	$1.50	MiniMax
Gemini 3.1 Flash-Lite	$0.25	$1.50	$1.75	Google
Qwen3.7-Plus	$0.40	$1.60	$2.00	Alibaba Cloud
MiMo-V2.5	$0.40	$2.00	$2.40	Xiaomi MiMo
Grok 4.3 (low context)	$1.25	$2.50	$3.75	xAI
MiMo-V2.5 Pro (≤256K)	$1.00	$3.00	$4.00	Xiaomi MiMo
GLM-5	$1.00	$3.20	$4.20	Z.ai
Kimi-K2.6	$0.95	$4.00	$4.95	Moonshot/Kimi
GLM-5.1	$1.40	$4.40	$5.80	Z.ai
Grok 4.3 (high context)	$2.50	$5.00	$7.50	xAI
MiMo-V2.5 Pro (>256K)	$2.00	$6.00	$8.00	Xiaomi MiMo
Qwen3.7-Max	$2.50	$7.50	$10.00	Alibaba Cloud
Gemini 3.5 Flash	$1.50	$9.00	$10.50	Google
Gemini 3.1 Pro Preview (≤200K)	$2.00	$12.00	$14.00	Google
GPT-5.4	$2.50	$15.00	$17.50	OpenAI
Gemini 3.1 Pro Preview (>200K)	$4.00	$18.00	$22.00	Google
Claude Opus 4.8	$5.00	$25.00	$30.00	Anthropic
GPT-5.5	$5.00	$30.00	$35.00	OpenAI
Claude Fable 5 / Claude Mythos 5	$10.00	$50.00	$60.00	Anthropic

For developers preferring not to use Xiaomi’s models, MiMo Code offers compatibility with third-party backends, including token plans from DeepSeek, Moonshot’s Kimi, and Zhipu’s GLM, as well as any OpenAI-compatible API, mirroring the flexibility of its OpenCode predecessor.

Intensifying Competition in Terminal AI Coding Agents

MiMo Code enters a competitive landscape populated by established players such as Anthropic’s Claude Code, OpenAI’s Codex CLI, Google’s Gemini CLI, and open-source alternatives like OpenCode and Aider. The significant development here is Xiaomi’s entry into this space.

Xiaomi, the world’s third-largest smartphone manufacturer and a growing force in the electric vehicle market, has been systematically expanding its MiMo AI division. This follows the release of the MiMo-7B reasoning model in April 2025, the MiMo-VL vision-language series, MiMo-V2-Flash, the 1-trillion-parameter MiMo-V2-Pro in March 2026, and the flagship V2.5 family in April 2026.

The initiative is spearheaded by Fuli Luo, a key figure from DeepSeek’s R1 project. Luo has described Xiaomi’s frontier AI push as a “quiet ambush,” supported by a 100-trillion free token grant for developers announced alongside the V2.5 launch.

This strategy echoes tactics seen from DeepSeek, Alibaba’s Qwen, MiniMax, and Moonshot AI’s Kimi series: offering highly capable models and tools under permissive licenses at prices significantly lower than those of U.S.-based labs, with the aim of cultivating a strong developer ecosystem through mindshare.

By combining an open-source agent harness with free access to a frontier-class model, Xiaomi is effectively removing both licensing and usage costs for adoption, at least for the initial phase.

Implications for Enterprises and Technical Decision-Makers

For engineering leaders, MiMo Code presents a low-risk, high-potential evaluation opportunity. Its MIT-style licensing allows for modification and commercial integration, its OpenCode lineage ensures architectural transparency, and its support for bring-your-own-model capabilities enables integration with internally approved endpoints, mitigating concerns about data routing to Xiaomi’s cloud.

The persistent memory system directly addresses a critical and widely acknowledged pain point in agentic development workflows, an area where competitors are also rapidly innovating.

However, several factors warrant careful consideration. The “limited-time” free model access is inherently temporary. Furthermore, routing code context through Xiaomi’s servers may conflict with stringent data residency or intellectual property policies for some organizations. The claimed benchmark advantages over Claude Code are self-reported, and the V0.1.0 release designation suggests the software is still in its early stages of maturity.

Organizations subject to U.S. government procurement restrictions concerning Chinese technology vendors should also factor this into their adoption decisions.

Business Style Takeaway: Xiaomi’s entry with MiMo Code highlights the escalating importance of efficient memory management and harness engineering in AI coding assistants, moving beyond raw model power. This development underscores a shift towards more integrated, state-aware AI tools, potentially lowering development costs and increasing productivity for businesses willing to explore open-source alternatives and new vendor ecosystems.

Original article : venturebeat.com

No votes yet.

Please wait...