Alibaba’s Qwen3.7-Max AI Runs 35 Hours Autonomously, Integrates Claude Code

The artificial intelligence landscape has firmly transitioned into the “agent era,” a new paradigm where AI models are capable of much more than generating text. They now actively plan, execute, and refine complex tasks over extended periods, spanning days rather than mere seconds.

In this evolving environment, it comes as little surprise that Alibaba’s renowned Qwen Team of AI researchers has unveiled a model designed for multi-day autonomous agentic work. This model, Qwen3.7-Max, has reportedly achieved approximately 35 hours of continuous autonomous execution, according to a company blog post. However, unlike previous Qwen Team releases, this powerful model is not open-source but proprietary.

This strategic shift is perhaps not unexpected, particularly following the departure of several key Qwen Team leaders earlier this year. From a financial perspective, Alibaba’s decision to monetize Qwen3.7-Max is understandable. Training advanced AI models like Qwen3.7-Max is a significant undertaking, and offering them freely as open-source initiatives does not immediately facilitate cost recovery.

In this regard, Alibaba appears to be aligning its strategy with major American AI firms such as OpenAI and Google. These companies typically offer their most advanced models through paid APIs, subscription bundles, or premium web plans, while making slightly less performant versions available as open-source options.

Nevertheless, the introduction of Qwen3.7-Max provides increased choices for enterprises and individual users, fostering greater competition for leading AI laboratories in the West. While increased competition generally benefits consumers, the model’s accessibility solely from Chinese-based endpoints may present limitations for American and European enterprises aiming to ensure robust compliance and security, particularly when fulfilling government contracts or adhering to data sovereignty regulations.

The Marathon AI Era

To fully grasp the significance of Qwen3.7-Max as a departure from previous models, it’s essential to examine its training methodology and operational capabilities.

Traditional language models often falter when tasked with maintaining a single train of thought over thousands of conversational turns. They tend to forget instructions, introduce fabricated variables (hallucinate), or become trapped in cyclical logical errors. Qwen3.7-Max, however, was specifically engineered as a “versatile agent foundation” with “long-horizon reasoning” capabilities to overcome precisely these limitations.

A compelling demonstration of this capability is an autonomous engineering task detailed by the Qwen team. The model was granted access to an isolated server equipped with a T-Head ZW-M890 PPU—a hardware architecture unfamiliar to the model from its training data. Its objective was to optimize an attention kernel.

Over an uninterrupted 35-hour period, Qwen3.7-Max operated autonomously. It executed 1,158 distinct tool calls, performed 432 kernel evaluations, diagnosed compilation failures, and iteratively refined the code, ultimately achieving a 10.0x geometric mean speedup. In comparison, competing Chinese models like z.ai’s GLM-5.1 and Moonshot’s Kimi K2.6 achieved maximum speedups of 7.3x and 5.0x, respectively, and often ceased their sessions when progress stalled. It’s important to note that both of these competitor models are available as open-source.

This sustained performance is attributed to what Alibaba terms “environment scaling.” Much like early large language models (LLMs) improved through exposure to diverse textual data, Qwen3.7-Max was trained across an extensive and scaled array of dynamic agentic environments.

The model can simulate a year-long startup lifecycle within the “YC-Bench” evaluation, navigating hundreds of decision-making phases, including personnel management and contract screening. In this simulation, Qwen3.7-Max generated $2.08 million in virtual revenue, nearly doubling the performance of its predecessor, Qwen3.6-Plus. Furthermore, the model incorporates built-in self-monitoring for reward-hacking, enabling it to autonomously detect attempts to game training environments and implement corrective heuristic rules.

A Brain for Any Scaffold

From a product development standpoint, Qwen3.7-Max is positioned as the cognitive engine for contemporary software development and enterprise automation.

The model boasts an expansive 1-million-token context window and a 64K maximum output limit, offering substantial capacity for processing extensive codebases or lengthy technical documents.

One of its most notable features is “cross-harness generalization.” Instead of being rigidly optimized for a specific proprietary interface, Qwen3.7-Max is designed to function as an adaptable intelligence layer across diverse agent frameworks. It natively supports the Anthropic API protocol, enabling developers to integrate it directly into existing tools such as Claude Code or OpenClaw.

Benchmark data released by Alibaba suggests this generalized approach has yielded significant advantages. On the Apex Math Reasoning benchmark, Qwen3.7-Max achieved a score of 44.5, surpassing Claude Opus-4.6 Max’s score of 34.5 and DeepSeek V4-Pro Max’s 38.3. The model also delivered dominant scores on Humanity’s Last Exam (41.4) and the realistic coding agent benchmark MCP-Atlas (76.4).

Alibaba's Qwen3.7-Max AI Runs 35 Hours Autonomously, Integrates Claude Code 2

This translates into tangible utility for end-users. Through open-source Model Context Protocol (MCP) integrations, the model can function as an autonomous office assistant, capable of interpreting university formatting specifications and automatically reformatting a Word document using command-line tools without human intervention.

Deploying this level of intelligence incurs a distinct cost. Developers accessing the API via Alibaba Cloud Model Studio will be charged $2.50 per 1 million input tokens and $7.50 per 1 million output tokens. The platform also includes explicit pricing for cache creation and retrieval, as well as a $10 fee per 1,000 calls for integrated web searches, although code interpreter tools are currently available free of charge for a limited period.

Qwen3.7-Max occupies a strategic position in the current API economy. While it commands a premium compared to aggressively priced domestic competitors—costing nearly double DeepSeek V4 Pro ($5.22) and Z.ai’s GLM-5.1 ($5.80)—it significantly undercuts the Western frontier models it frequently matches on performance benchmarks.

For context, executing complex agentic workflows through OpenAI’s GPT-5.4 or Anthropic’s Claude Opus 4.7 incurs costs of $17.50 and $30.00 per million tokens, respectively. VentureBeat’s pricing chart below illustrates this comparison:

Model Input Output Total Cost Source
MiMo-V2.5 Flash $0.10 $0.30 $0.40 Xiaomi MiMo
MiniMax M2.7 $0.30 $1.20 $1.50 MiniMax
Gemini 3.5 Flash-Lite $0.25 $1.50 $1.75 Google
MiMo-V2.5 $0.40 $2.00 $2.40 Xiaomi MiMo
Kimi-K2.6 $0.95 $4.00 $4.95 Moonshot/Kimi
GLM-5 $1.00 $3.20 $4.20 Z.ai
Grok 4.3 (low context) $1.25 $2.50 $3.75 xAI
DeepSeek V4 Pro $1.74 $3.48 $5.22 DeepSeek
GLM-5.1 $1.40 $4.40 $5.80 Z.ai
Claude Haiku 4.5 $1.00 $5.00 $6.00 Anthropic
Grok 4.3 (high context) $2.50 $5.00 $7.50 xAI
Qwen3.7-Max $2.50 $7.50 $10.00 Alibaba Cloud
Gemini 3.5 Flash $1.50 $9.00 $10.50 Google
Gemini 3.1 Pro Preview (≤200K) $2.00 $12.00 $14.00 Google
GPT-5.4 $2.50 $15.00 $17.50 OpenAI
Gemini 3.1 Pro Preview (>200K) $4.00 $18.00 $22.00 Google
Claude Opus 4.7 $5.00 $25.00 $30.00 Anthropic
GPT-5.5 $5.00 $30.00 $35.00 OpenAI

By pricing Qwen3.7-Max just below Google’s Gemini 3.5 Flash ($10.50) but considerably above budget-tier models, Alibaba is signaling that this release is not intended as a commodity offering but rather as a flagship reasoning engine. The pricing strategy appears designed to attract enterprise workloads away from the most expensive solutions offered by Silicon Valley’s leading AI firms.

Licensing Remains Proprietary for Now

Despite its technical advancements, the most debated aspect of Qwen3.7-Max is its distribution model. Alibaba is categorizing this release as a “proprietary model,” accessible exclusively via API.

Historically, Alibaba’s Qwen models have been highly valued by the open-source and local LLM communities, with previous versions like Qwen 2.5 and Qwen 3.6 making their weights publicly available. Open weights empower developers, researchers, and enterprises to download, run on their own infrastructure, and fine-tune models for specific or data-sensitive applications without transmitting proprietary information to third-party servers.

By restricting Qwen3.7-Max to an API-only format, Alibaba is adopting the commercial strategy employed by industry leaders like OpenAI (with GPT-4) and Anthropic (with Claude). For enterprise users, this necessitates entrusting their data streams to Alibaba Cloud and relying entirely on internet connectivity for their agentic workflows. For the open-source community, this means a loss of access to what is currently one of the most sophisticated models available.

Community Reactions Split Between Awe and Disappointment

The reaction from the developer community has been immediate, marked by a blend of profound admiration for the engineering feat and considerable frustration regarding the licensing model.

Prominent AI commentator Sudo su (@sudoingX) articulated the general sentiment on X (formerly Twitter): “qwen is unreal. they just dropped 3.7 max and it is beating opus 4.6 max on most of the benchmarks they ran.”

The technical specifications, particularly the model’s endurance, have astonished many in the field. “the apex math number, 44.5 against opus 34.5, that is not a small gap,” Sudo su observed. “the 35 hours straight on a kernel optimization task with 1000+ tool calls is the part i keep rereading. that is the agent era thing actually happening, not a slide.”

The rapid pace of Alibaba’s development cycle is also drawing significant attention. With Qwen 3.6 released just last month, the advancement to 3.7-Max underscores a relentless innovation cadence. As Sudo su noted, “nobody else is moving like this.”

However, the accolades are significantly tempered by the shift towards a closed ecosystem. The unavailability of the model weights is viewed as a setback for the localized AI movement, which depends on state-of-the-art open models to advance capabilities on consumer hardware and private enterprise clusters.

“one thing though, please open source this one too,” Sudo su pleaded in their post. “3.6 dense made the entire local llm ecosystem better. the max tier going api only would close a door we have been keeping open. give us the weights eventually.”

Qwen3.7-Max definitively demonstrates that the autonomous agent era is no longer a theoretical concept but a present reality, capable of executing complex engineering tasks independently. The critical question now is whether this new frontier of AI will evolve into a democratized resource accessible for local deployment, or remain an intelligence utility exclusively available through cloud rental. As of now, with Qwen3.7-Max, the latter is unequivocally the case.

Business Style Takeaway: Alibaba’s Qwen3.7-Max signifies a major advancement in AI agent capabilities, demonstrating sustained autonomous execution for complex tasks and challenging Western AI dominance on benchmarks. However, its proprietary licensing marks a strategic pivot away from open-source, signaling that cutting-edge AI development is increasingly becoming a commercialized, API-driven offering, which enterprises must weigh against their data security and sovereignty requirements.

According to the portal: venturebeat.com

No votes yet.
Please wait...

Leave a Reply

Your email address will not be published. Required fields are marked *