Stanford DeLM Slashes Multi-Agent Costs 50% Without Orchestrator

The prevailing paradigm in current AI frameworks often posits that sophisticated agents necessitate a central orchestrator to manage operations, direct requests, and maintain system coherence. However, this fundamental assumption might be flawed, with significant implications for computational costs and response times.

Stanford University researchers have introduced a novel framework, the Decentralized Language Model (DeLM), challenging this established notion. DeLM operates on the principle that AI agents can achieve effective coordination through direct interaction, circumventing the need for a central hub to process every communication and update.

According to co-developers Yuzhen Mao and Azalia Mirhoseini, DeLM utilizes a shared knowledge base as a “common communication substrate.” This allows agents to build upon each other’s verified progress without necessitating a central agent to consolidate, filter, and re-disseminate information. This architecture offers a more efficient and nuanced approach, enabling agents to “build on prior findings, avoid repeated failures, preserve constraints, and recover detailed evidence only when needed.”

The Challenges of Traditional Multi-Agent Systems

Conventional centralized multi-agent systems typically follow a structured workflow: a primary agent dissects tasks into subtasks, delegates them to multiple subordinate agents operating in parallel, awaits their responses, synthesizes intermediate findings, and then issues subsequent directives based on the gathered context.

While this method facilitates the scaling of Large Language Model (LLM) reasoning, the Stanford team argues that its scalability is inherently limited. Every successful discovery, partial result, or encountered failure must be reported back to the central agent, which then decides how to integrate and redistribute this information to the other agents.

“As the number of subtasks grows, this controller becomes a communication and integration bottleneck,” Mao and Mirhoseini note. Furthermore, the central orchestrator risks “diluting, omitting, or distorting” crucial information, potentially leading to lost progress.

This bottleneck extends to scenarios requiring complex, long-context reasoning. After receiving reports from sub-agents, a central agent typically groups related concepts and data points through an unsupervised learning process. It might then assign these “evidence clusters” to sub-agents before fully understanding their relevance or correctness when combined.

When a sub-agent receives inadequate context, it can become disoriented, prompting another round of data retrieval or delegation requests to the central agent. “This back-and-forth makes coordination slower, more iterative, and increasingly constrained by a single overloaded main agent,” the researchers state.

VB Transform · July 14–15 · Menlo Park · Agentic orchestration

Intuit rebuilt its multi-agent system in 60 days. What did they change — and why?

At Transform, engineering leaders from Intuit, Target, and Instacart break down how they redesigned their orchestration architectures for reliability, scale, and real customers.

See the full agenda →

DeLM’s Approach and Functionality

In contrast, DeLM is architected around parallel agents, a shared context repository, and a dynamic task queue.

The shared context functions as a curated repository of “gists”—concise summaries of information potentially valuable to other agents. These summaries encompass verified findings, evidence-backed conclusions, preliminary results, and documented failures, all linked to detailed underlying evidence that agents can access as needed for their specific tasks.

The task queue comprises subsequent subtasks that agents can independently select and execute.

“Agents write compact, verified updates into a shared context that later agents can read directly,” the researchers explain. This fosters a cumulative “shared problem state” comprising useful findings, failures, and constraints, bypassing the need for a central controller.

The DeLM pipeline operates as follows:

Initialization: Initial inputs are segmented into distinct work units and populated into the task queue.
Parallel Execution: Agents operate independently and concurrently, drawing tasks from the queue and referencing the shared context as they advance.
Compression and Verification: Results are condensed into reusable “gists” and validated against supporting evidence. Only fully verified gists are disseminated to the collective.
Additional Work (If Necessary): Upon completion of all tasks in the queue, the last agent to finalize a task reviews the accumulated shared context to ascertain if further work is warranted.
Finalization: The concluding agent determines that no further steps are required and returns the definitive answer.

The researchers elaborate that agents “exchange progress through shared state, asynchronously claim ready tasks, and scale more adaptively as the number of subtasks grows.”

DeLM’s Performance in Practical Applications

The DeLM framework empowers agents to avoid redundant efforts, leverage each other’s successes and failures, and concentrate on unresolved challenges.

This architecture is particularly beneficial in contexts such as software engineering, specifically during test-time scaling where AI models are given additional time to refine their reasoning and problem-solving abilities. Different agents can simultaneously explore their own hypotheses or pursue distinct reasoning pathways while still contributing to a shared pool of intermediate findings. Concurrent debugging serves as a prime example.

DeLM is also well-suited for tasks involving long-context reasoning and multi-document question answering. Agents can concurrently analyze their respective evidence clusters—collections of papers, code, or other relevant materials—while maintaining a consolidated “global compact view” of the evidence compiled collectively.

The research indicates that DeLM enhances the accuracy of agentic tasks while significantly reducing operational costs. Empirical results on real-world benchmarks substantiate this claim. On SWE-bench Verified, which assesses AI models’ proficiency in resolving practical software engineering issues, DeLM demonstrated a 10.5% improvement over the leading baseline and achieved approximately a 50% reduction in cost per task.

Furthermore, on LongBench‑v2 Multi‑Doc QA, a benchmark evaluating LLMs’ capacity to process extensive context for real-world problems, DeLM achieved superior accuracy across four major model families, including GPT‑5.4, Claude Sonnet, Gemini Flash, and DeepSeek‑V4‑Pro.

Mao detailed on X several key factors contributing to DeLM’s outperformance on SWE-Bench. Firstly, the system facilitates the sharing of failures. In typical parallel runs, if one agent pursues an incorrect path, that failure remains isolated, potentially causing subsequent agents to waste resources on the same unproductive trajectory. DeLM, however, incorporates failed hypotheses into the shared context.

“Later agents can read them as constraints, avoid repeated exploration, and redirect their search toward more promising fixes,” Mao stated.

Moreover, once verified, constraints are immediately integrated into the agents’ shared context, establishing a binding collective state. “Later agents inherit them, build around them, and avoid repeating globally invalid simplifications,” Mao added.

Crucially, DeLM maintains a compact representation of shared progress, enabling efficient reuse. Its “unfolding” mechanism allows agents to access concise gists by default, with the option to expand them into more detailed summaries and raw evidence if needed.

The researchers point out that while providing all raw documents and traces offers maximum information, it can overwhelm agents’ context windows and inflate costs.

“If agents shared full traces, each worker would need to read long command histories, file dumps, failed edits, and intermediate reasoning, turning coordination itself into another long-context bottleneck,” Mao explained.

Conversely, sharing only compact summaries, while more cost-effective, risks losing critical details and evidence, thereby compromising reasoning reliability.

The unfolding feature thus offers a “coarse-to-fine” access model, allowing agents to retrieve detailed information selectively. This approach can simultaneously enhance accuracy and manage costs.

Ultimately, frameworks like DeLM enable agents to operate with greater efficiency by preventing redundant information processing and repeated failed analyses. They achieve higher effectiveness through the propagation of successful findings across parallel threads and greater robustness by exclusively sharing verified claims.

For enterprise developers, DeLM challenges a foundational assumption: that all multi-agent workflows necessitate a central control entity. The performance metrics achieved on SWE-bench and LongBench-v2 suggest that this decentralized model is not merely a theoretical improvement but also delivers tangible benefits in terms of speed, accuracy, and cost reduction.

Business Style Takeaway: The introduction of decentralized AI agent coordination frameworks like DeLM signals a significant shift from centralized control, promising substantial reductions in inference costs and latency. Businesses can leverage this architectural innovation to build more efficient, scalable, and accurate AI solutions, particularly in complex problem-solving domains like software engineering and extensive data analysis, potentially halving operational expenses.

Original article : venturebeat.com

No votes yet.

Please wait...

The Challenges of Traditional Multi-Agent Systems

DeLM’s Approach and Functionality

DeLM’s Performance in Practical Applications

Leave a ReplyCancel Reply