MiniMax-M3 Outshines GPT-5.5 & Gemini 3.1 Pro on Benchmarks for a Fraction of the Cost

MiniMax-M3 Outshines GPT-5.5 & Gemini 3.1 Pro on Benchmarks for a Fraction of the Cost 3

Chinese AI firm MiniMax has launched its groundbreaking M3 large language model, aiming to redefine the enterprise AI landscape with advanced coding and agentic capabilities, extensive context windows, and native multimodality at a significantly reduced cost compared to leading proprietary models. Available via API, the M3 model boasts a 1-million-token context window and multimodal understanding, with pricing beginning at just $20 per month under new subscription plans. This release challenges the established market dynamics, offering powerful features previously exclusive to closed-source ecosystems.

Further bolstering its accessibility, MiniMax has announced plans to release M3 under an open-source license with “open weights” within the next ten days. This will allow businesses to download, customize, and deploy the model entirely in-house, free of charge. Currently, access is available through the MiniMax API at a promotional rate of $0.3 per 1 million input tokens and $1.20 per million output tokens, a price point considerably lower than major U.S. competitors like Google, OpenAI, and Anthropic. Even at its full price, M3 is projected to be 8-20% of the cost of comparable proprietary models.

The launch of M3 disrupts the traditional trade-off between sophisticated closed-source models and more accessible, but less capable, open-source alternatives. By integrating frontier-level performance, long context handling, and multimodal features into an open-weights framework, MiniMax is setting a new industry standard and lowering the barrier to entry for complex AI development.

VentureBeat Frontier AI Model API Pricing Snapshot

Model

Input

Output

Total Cost

Source

MiMo-V2.5 Flash

$0.10

$0.30

$0.40

Xiaomi MiMo

deepseek-v4-flash

$0.14

$0.28

$0.42

DeepSeek

deepseek-v4-pro

$0.435

$0.87

$1.305

DeepSeek

MiniMax-M3

$0.30

$1.20

$1.50 (limited time only)

MiniMax

Gemini 3.1 Flash-Lite

$0.25

$1.50

$1.75

Google

MiMo-V2.5

$0.40

$2.00

$2.40

Xiaomi MiMo

Grok 4.3 low context

$1.25

$2.50

$3.75

xAI

GLM-5

$1.00

$3.20

$4.20

Z.ai

Kimi-K2.6

$0.95

$4.00

$4.95

Moonshot/Kimi

GLM-5.1

$1.40

$4.40

$5.80

Z.ai

Grok 4.3 high context

$2.50

$5.00

$7.50

xAI

Qwen3.7-Max

$2.50

$7.50

$10.00

Alibaba Cloud

Gemini 3.5 Flash

$1.50

$9.00

$10.50

Google

Gemini 3.1 Pro Preview ≤200K

$2.00

$12.00

$14.00

Google

GPT-5.4

$2.50

$15.00

$17.50

OpenAI

Gemini 3.1 Pro Preview >200K

$4.00

$18.00

$22.00

Google

Claude Opus 4.8

$5.00

$25.00

$30.00

Anthropic

GPT-5.5

$5.00

$30.00

$35.00

OpenAI

MiniMax Sparse Attention (MSA) Revolutionizes Model Efficiency

The efficiency of the M3 model is largely attributed to its novel architectural design, departing from traditional Transformer networks. Standard attention mechanisms, which scale quadratically ($O(N^2)$) with input length, present significant computational and cost challenges for longer sequences. MiniMax addresses this by implementing its proprietary MiniMax Sparse Attention (MSA) technique.

MSA functions as an intelligent indexing system. It employs a pre-filtering phase to partition Key-Value (KV) matrices into precise blocks. At the operational level, MSA utilizes a “KV outer gather Q” approach, where KV blocks are treated as an outer loop, dynamically aggregating only relevant queries. This method ensures each data block is read once with contiguous memory access, dramatically improving hardware utilization. Internal tests indicate that MSA outperforms existing open-source solutions like Flash-Sparse-Attention and flash-moba by over four times.

When operating at its maximum context length of 1 million tokens, M3’s per-token compute demand is reduced to one-twentieth of previous generation models. This translates into a ninefold acceleration in the prefilling stage and a fifteenfold increase during decoding, making complex processing significantly faster and more economical.

M3 was engineered from the ground up as a natively multimodal system, rather than by fusing separate text and vision models. MiniMax integrated text, images, and other visual components into its pretraining corpus, which now exceeds 100 trillion tokens. This deep data alignment allows the model to accurately translate intricate visual data, such as charts and maps, into structural code without context loss. On benchmark assessments, M3 demonstrates strong performance, achieving 59.0% on SWE-Bench Pro, surpassing closed-source models like GPT-5.5 and Gemini 3.1 Pro in agentic performance. It also scores 66.0% on Terminal Bench 2.1, 74.2% on MCP Atlas, and 83.5% on BrowseComp, outperforming Claude Opus 4.7 in autonomous browsing.

MiniMax-M3 Outshines GPT-5.5 & Gemini 3.1 Pro on Benchmarks for a Fraction of the Cost 4

While M3 shows strong performance across various benchmarks, it faces stiff competition from premium models like Anthropic’s Claude Opus 4.8. On SWE-Bench Pro, M3’s 59.0% score trails Opus 4.8’s 69.2%. Similarly, in automated system environments via Terminal-Bench 2.1, M3’s 66.0% execution score is slightly behind Opus 4.8’s 74.6%. In GUI interaction benchmarks, M3 achieved 70.0% compared to Opus 4.8’s 83.4%. These results highlight that while M3 offers unparalleled efficiency and capability in the open-source domain, top-tier closed-source models still hold an edge in highly complex reasoning tasks.

However, M3’s performance is highly competitive when compared to other open-weights models like DeepSeek-V4 Pro Max. M3 slightly surpasses DeepSeek-V4 Pro Max on SWE-Bench Pro with a 59.0% resolution efficiency compared to 55.4%. While DeepSeek-V4 Pro Max leads slightly in command-line environments (67.9% vs. 66.0%), M3 holds a narrow advantage in tool-use frameworks like MCP Atlas (74.2% vs. 73.6%) and is statistically tied in web browsing capabilities. This suggests that MiniMax’s efficient attention mechanism delivers competitive performance without the extensive parameter scaling seen in some other models.

MiniMax Code AI Introduces Advanced Agentic Team Capabilities

MiniMax is translating these architectural advancements into practical applications, including standalone products and customizable subscription tiers. The flagship offering is **MiniMax Code**, an AI agent designed to leverage M3’s multi-step reasoning and agentic capabilities. Operating through web or desktop interfaces, MiniMax Code functions as an “Agent Team,” capable of decomposing complex engineering tasks into concurrent workflows. It utilizes a “Producer + Verifier” adversarial loop, where one agent instance generates code and another rigorously tests and refines it, enabling autonomous operation over extended periods.

The native visual grounding of MiniMax Code supports direct computer interaction, allowing users to issue voice commands to perform tasks like data entry from spreadsheets into enterprise ERP clients. For developers seeking integration, M3 can be accessed via an API key (sk-cp) compatible with popular IDEs such as Claude Code, Cursor, Roo Code, and Cline. The API includes a “thinking mode” toggle, allowing users to prioritize deep reasoning and long-horizon planning or switch to low-latency text completion.

The associated **Token Plan** subscription tiers offer shared multimodal quotas billed annually:

  • Plus ($20/month): Provides approximately 1.7 billion tokens per month, supporting 3-4 concurrent agents.

  • Max ($50/month): Offers around 5.1 billion tokens per month, manages 4-5 concurrent agents, and includes 3 automated video clips daily via Hailuo 2.3.

  • Ultra ($120/month): Delivers approximately 9.8 billion tokens per month, facilitates 6-7 concurrent agents, and extends video generation capacity to 5 clips per day.

Open Weights Strategy Enhances Enterprise Adoption and Security

MiniMax’s commitment to releasing M3 under an open-weights license is a significant strategic move, particularly for enterprises with stringent data privacy and compliance requirements. The imminent release of weights and documentation on platforms like HuggingFace and GitHub will enable organizations to run M3 entirely within their private infrastructure. This local deployment eliminates the data leakage risks associated with public APIs and grants businesses complete control over their AI models.

This approach allows for deep customization, including bespoke fine-tuning, architectural modifications, and the embedding of specialized system prompts, transforming M3 into a tailored proprietary asset. Unlike closed-source providers that often limit customization to basic fine-tuning or prompt engineering, the open-weights model offers full pipeline control and optimization. This significantly mitigates the cost vector consistency issues tied to perpetual per-token API pricing, as the computational demands are drastically reduced.

Feature / Model Attribute

Closed API Providers (e.g., GPT-5.5, Opus 4.7)

Open-Weights Frontier (MiniMax M3)

Data Privacy & Boundaries

Requires external API requests; potential data ingestion vectors.

Total local isolation; runs entirely inside private user clusters.

Custom Optimization

Limited to basic fine-tuning wrappers or prompt engineering.

Full pipeline control; architecture allows deep adapter/weights customization.

Cost Vector Consistency

Bound to perpetual per-token API pricing models.

Computational demands cut to 1/20th; mitigates hardware ceiling.

Initial Developer Community Reception is Highly Positive

The developer ecosystem has responded enthusiastically to M3’s performance benchmarks, particularly its long-horizon autonomous capabilities and cost-effectiveness. A key point of discussion is a 12-hour autonomous verification test where M3 successfully reproduced experimental results from an academic paper titled “Learning Dynamics of LLM Finetuning.” As highlighted by researcher @MikaStars39 on X, M3 autonomously produced code, experimental figures, and validated key findings related to SFT stages, DPO experiments, and mitigation methods.

Tool creators have also recognized the practical economic benefits of M3’s new attention mechanism. The team behind the agentic AI coding harness Cline confirmed day-one compatibility, noting M3’s breakthrough in sparse-attention architecture that reduces compute and cost significantly. Tech commentator @jumperz observed that M3’s approach to context scaling through architectural optimization, rather than brute-force hardware scaling, establishes a highly efficient open-source baseline, signaling a shift towards architectural innovation driving the next phase of AI development.

For enterprises developing autonomous software or AI infrastructure, MiniMax M3 presents a compelling value proposition. While DeepSeek-V4 Pro offers a slightly lower API cost, M3’s superior performance on benchmarks like SWE-Bench Pro (59.0% vs. 55.4%) justifies its marginal premium. More critically, as an open-weights model, M3 enables organizations to deploy locally, bypassing data egress fees, eliminating vendor lock-in, and creating a permanent, privately owned AI asset from a highly efficient runtime budget.

Business Style Takeaway: MiniMax’s M3 release signifies a major shift in the AI model landscape by merging high-end capabilities with open-source accessibility and aggressive pricing, compelling businesses to re-evaluate their AI strategies and vendor choices. The availability of open weights fundamentally alters the total cost of ownership and data control for enterprises, making advanced AI deployment more feasible and secure. This move challenges established players and accelerates the democratization of cutting-edge AI technologies across industries.

Based on materials from : venturebeat.com

No votes yet.
Please wait...

Leave a Reply

Your email address will not be published. Required fields are marked *