
Alibaba has unveiled Qwen3.7-Plus, the latest iteration of its prominent large language model (LLM) series. This new model distinguishes itself with enhanced multimodal capabilities and a significant 60% reduction in cost compared to its predecessor, Qwen3.7-Max, which was exclusively text-based.
However, Qwen3.7-Plus, much like Qwen3.7-Max, is accessible solely through proprietary application programming interfaces (APIs) and Qwen Chat, operating under a closed commercial license.
This strategic shift represents a departure from Alibaba’s previous emphasis on releasing powerful, near state-of-the-art open-source models. Such a move is likely to cause disappointment among enterprises and developers, including major U.S. companies like Airbnb, who have come to rely on the open-source Qwen models.
Despite this change, Qwen3.7-Plus warrants attention due to its competitive pricing and robust performance in multimodal tasks, such as generating high-quality visuals and analyzing video, imagery, and screenshots—capabilities absent in the text-only Qwen3.7-Max. Its cost positions it favorably among current high-performance AI models, slightly exceeding the promotional pricing of MiniMax’s new M3 model from a Chinese competitor.
VentureBeat Frontier AI Model API Pricing Snapshot
|
Model |
Input |
Output |
Total Cost |
Source |
|
MiMo-V2.5 Flash |
$0.10 |
$0.30 |
$0.40 |
Xiaomi MiMo |
|
deepseek-v4-flash |
$0.14 |
$0.28 |
$0.42 |
DeepSeek |
|
deepseek-v4-pro |
$0.435 |
$0.87 |
$1.305 |
DeepSeek |
|
MiniMax-M3 |
$0.30 |
$1.20 |
$1.50 |
MiniMax |
|
Qwen3.7-Plus |
$0.40 |
$1.60 |
$2.00 |
Alibaba Cloud |
|
Gemini 3.1 Flash-Lite |
$0.25 |
$1.50 |
$1.75 |
|
|
MiMo-V2.5 |
$0.40 |
$2.00 |
$2.40 |
Xiaomi MiMo |
|
Grok 4.3 low context |
$1.25 |
$2.50 |
$3.75 |
xAI |
|
GLM-5 |
$1.00 |
$3.20 |
$4.20 |
Z.ai |
|
Kimi-K2.6 |
$0.95 |
$4.00 |
$4.95 |
Moonshot/Kimi |
|
GLM-5.1 |
$1.40 |
$4.40 |
$5.80 |
Z.ai |
|
Grok 4.3 high context |
$2.50 |
$5.00 |
$7.50 |
xAI |
|
Qwen3.7-Max |
$2.50 |
$7.50 |
$10.00 |
Alibaba Cloud |
|
Gemini 3.5 Flash |
$1.50 |
$9.00 |
$10.50 |
|
|
Gemini 3.1 Pro Preview ≤200K |
$2.00 |
$12.00 |
$14.00 |
|
|
GPT-5.4 |
$2.50 |
$15.00 |
$17.50 |
OpenAI |
|
Gemini 3.1 Pro Preview >200K |
$4.00 |
$18.00 |
$22.00 |
|
|
Claude Opus 4.8 |
$5.00 |
$25.00 |
$30.00 |
Anthropic |
|
GPT-5.5 |
$5.00 |
$30.00 |
$35.00 |
OpenAI |
Sustaining State in Complex Agent Execution Loops
For technical leaders architecting autonomous agents, the primary challenge has often been not initial model intelligence, but rather state decay—the tendency for an agent framework to lose its analytical trajectory during multi-step, long-horizon tasks.
Qwen3.7-Plus addresses this architectural vulnerability through a sophisticated approach to context management and reasoning state preservation.
The model boasts a substantial 1-million token context window, dedicating up to 256K tokens to its internal chain-of-thought processing. This capacity is crucial for complex operations; consider an automated cloud migration agent that can ingest an entire codebase, map dependencies, and dedicate thousands of tokens to evaluating edge cases before executing a single command.
A key feature is the API’s `preserve_thinking` parameter. This capability, introduced in the previous Qwen 3.6 generation and integrated into both its open-weight and proprietary Max models, acts as a standardized architectural bridge within Alibaba’s ecosystem.
At its core, `preserve_thinking` operates at the API and template level to maintain internal `
This structural continuity is vital for developers engaged in long-horizon tasks. By preserving these internal logic loops, the feature prevents the model from losing context or unnecessarily recomputing its historical data midway through an operation.
When a model performs complex, multi-step agentic coding assignments, this state retention ensures the system maintains its original line of reasoning without losing track of its objectives or the underlying logic of its previous actions.
This underlying concept is increasingly defining the architecture across major AI research labs.
Anthropic employs a similar capability, termed “Extended Thinking,” for its advanced models like Claude Opus 4.8. This framework requires developers to feed unmodified thinking blocks back into the API on subsequent turns to sustain a continuous reasoning chain.
OpenAI addresses this challenge through an encrypted reasoning pass-back mechanism for models like GPT-5.5. Within the OpenAI ecosystem, developers must return specific reasoning items generated alongside previous function calls, ensuring the model explicitly recalls the rationale behind its tool executions.
Ultimately, Alibaba’s `preserve_thinking` terminology reflects what has rapidly become an essential requirement for sophisticated multi-turn reasoning in AI systems.
Benchmark Performance: Competitive but Not Yet Leading-Edge
The advanced reasoning architecture of Qwen3.7-Plus translates to notable improvements in multimodal and agentic benchmarks. However, its performance still trails some leading proprietary models from U.S. companies, such as Anthropic’s Claude Opus 4.6 and OpenAI’s GPT-5.4.

On the Terminal Bench 2.0-Terminus, a benchmark measuring an LLM’s capability to safely and iteratively execute terminal commands, Qwen3.7-Plus achieved a score of 70.3. This performance surpasses DeepSeek-V4-Pro Max (67.9) and Gemini-3.1 Pro (63.5).
For computer vision tasks requiring localized interface interpretation, such as those evaluated by ScreenSpot Pro, the model scored 79.0, significantly outperforming established models like GPT-5.4 (xhigh) at 67.4 and Claude-Opus-4.6 at 49.5. Agent Evaluation Metrics (Selected Benchmarks)
Enterprise Considerations for Qwen3.7-Plus
For enterprise architects evaluating Qwen3.7-Plus, the central question is: What current technology stack components can this model replace?
Qwen3.7-Plus is positioned as a viable alternative to premium frontier models (such as GPT-5 or Claude-Max tier models) within high-frequency developer workflows, robotic process automation (RPA), and data engineering pipelines.
Instead of utilizing expensive, general-purpose flagship models for repetitive system operations, technical teams can redirect these tasks to Qwen3.7-Plus, which natively handles visual interface interpretation, command execution, and code generation.
Alibaba’s API delivery is designed for compatibility with existing open-source and proprietary enterprise frameworks. The endpoints are fully OpenAI-compatible, minimizing infrastructure adjustments for teams looking to swap dependencies. For organizations utilizing autonomous terminal frameworks, integration is natively supported across various environments.
Engineers can execute Qwen3.7-Plus directly through their local terminal setups by modifying base environment targets.
The cost of running agent frameworks that frequently access extensive code repositories or visual layout histories can rapidly escalate.
Alibaba addresses this through granular caching pricing tiers.
While standard input processing is priced at $0.40 per million tokens, subsequent reads from an explicitly created cache (e.g., a static enterprise UI kit or large codebase accessed repeatedly) drop to $0.04 per 1M tokens.
This tiered pricing structure makes high-frequency, multi-turn agent iterations economically feasible at an enterprise scale.
Licensing and Compliance Implications of Closed Weights
For legal and security teams evaluating any Qwen model, understanding the licensing framework and data pipeline boundaries is critical.
Unlike previous Qwen iterations, which gained enterprise adoption through freely available open-source weights under licenses like Apache 2.0, Qwen3.7-Plus is strictly offered as a managed, commercial cloud API via Alibaba Cloud Model Studio. This distinction has significant implications for enterprise risk management:
-
No Local Weight Deployment: Organizations cannot download, sandbox, or host Qwen3.7-Plus weights within their own air-gapped data centers. All data processing, visual analysis, and execution requests must route through Alibaba Cloud’s international endpoints.
-
Compliance and Data Sovereignty: As the model necessitates cloud-based inference, companies operating under strict data residency regulations (e.g., healthcare or defense sectors) must meticulously assess whether external API calls align with their specific data sovereignty obligations.
-
Managed Risk Mitigation: Conversely, the managed API structure alleviates the internal burden of provisioning, optimizing, and maintaining extensive GPU infrastructure (such as Nvidia H100 clusters) required for hosting internal agent networks.
Cost-Effective Multimodal Intelligence at Scale
Early feedback from developer communities and technical venture capital underscores the evolving economics of AI agent deployment.
Prominent industry commentator and Web3 venture capitalist @Boxmining noted the strategic cost advantage:
“Qwen 3.7 Plus being 40% cheaper than Max changes the conversation. If the output is close enough for most coding and much stronger for visual workflows, do you really need Max every day or only for the heavy terminal-only jobs?”
This perspective aligns with the current enterprise trend of optimizing operational budgets by shifting from unconstrained compute to targeted task automation.
Specialized researchers within the AI ecosystem highlight that Qwen3.7-Plus represents more than just an incremental improvement in text generation.
Dunjie Lu, a research intern at Alibaba Qwen, stated:
“It shows clear gains over Qwen3.6-Plus in computer-use capabilities, with stronger generalization beyond general desktop tasks into professional workflows such as data engineering and scientific research.”
For enterprises planning their future infrastructure, Qwen3.7-Plus presents a compelling option. If the primary objective is to build resilient, visually-capable autonomous software agents that interact directly with developer environments and cloud consoles—without exceeding inference budgets—this model offers a strong justification for moving away from more expensive frontier alternatives.
Business Style Takeaway: Alibaba’s Qwen3.7-Plus signals a market trend toward more cost-effective, multimodal AI solutions, particularly for enterprise automation tasks that blend code and visual interface interaction. While the move to a closed-source model raises compliance considerations for some, its pricing and sustained context capabilities make it a pragmatic choice for organizations optimizing AI operational expenses for agentic workflows.
Source: : venturebeat.com
