Sakana’s 7B Model Orchestrates GPT, Claude, and Gemini LLMs

Every LangChain pipeline your team hardcodes starts breaking the moment the query distribution shifts — and it always shifts. That bottleneck is what Sakana AI set out to eliminate.

Researchers at Sakana AI have introduced the “RL Conductor,” a small language model trained via reinforcement learning to automatically orchestrate a diverse pool of worker LLMs. Conductor dynamically analyzes inputs, distributes labor among workers, and coordinates among agents.

This automated coordination achieves state-of-the-art results on difficult reasoning and coding benchmarks, outperforming individual frontier models like GPT-5 and Claude Sonnet 4 as well as expensive human-designed multi-agent pipelines. It achieves this performance at a fraction of the cost and with fewer API calls than competitors. RL Conductor is the backbone of Fugu, Sakana AI’s commercial multi-agent orchestration service.

The limitations of manual agentic frameworks

Large language models possess strong latent capabilities. However, tapping these capabilities to their fullest presents a significant challenge. Extracting this level of performance relies heavily on manually designed agentic workflows, which serve as critical components in commercial AI products.

Nevertheless, these frameworks fall short due to their inherent rigidity and constraints. In comments to VentureBeat, Yujin Tang, co-author of the paper, explained the exact breaking point of current systems: “While using frameworks with hard-coded pipelines like LangChain and Mixture-of-Agents can work well for specific use cases… In production, an inherent bottleneck arises when targeting domains with large user bases with very heterogeneous demands.”

Tang noted that achieving “real-world generalization in such heterogeneous applications inherently necessitates going beyond human-hardcoded designs.”

Another significant bottleneck in building robust agentic systems is that no single model is optimal for all tasks. Different models are fine-tuned to specialize in distinct domains. One model might excel at scientific reasoning, while another is superior at code generation, mathematical logic, or high-level planning.

Given these varying characteristics and complementary skills among models, manually predicting and hard-coding the ideal combination of models for every query is practically impossible. An optimal agentic framework should be able to analyze a problem and delegate subtasks to the most suitable expert in the pool.

Conducting an orchestra of agents

The RL Conductor is engineered to overcome the limitations of rigid, human-designed frameworks. As its name implies, it orchestrates a collection of agents by dividing complex problems, delegating targeted subtasks, and designing communication topologies for a set of worker LLMs.

Instead of relying on fixed code or static routing, the Conductor orchestrates these models by generating a customized workflow. For each step in the workflow, the model generates a natural language instruction for a specific aspect of the task, assigns an agent to carry it out, and defines an “access list” that dictates which past subtasks and responses from other agents are included in that agent’s context.

By defining everything in natural language, the Conductor builds flexible workflows tailored to each input. It can construct simple sequential chains, parallel tree structures, or even recursive loops depending on the problem’s demands.

Sakana's 7B Model Orchestrates GPT, Claude, and Gemini LLMs 3

Crucially, the model learns these strategies not through human design but via reinforcement learning (RL) and reward maximization. During training, the model is presented with a task, a pool of workers, and a reward signal based on the correctness of its answer and output format.

Through a simple trial-and-error RL algorithm, the model organically discovers which combinations of instructions and communication structures yield the highest reward. Consequently, it automatically adopts advanced orchestration strategies such as targeted prompt engineering, iterative refinement, and meta-prompt optimization.

The model learns to dynamically adjust its strategies and leverage the distinct strengths of its worker agents without any human developer needing to hard-code the process.

Conductor in action

To test RL Conductor in practice, the researchers fine-tuned the 7-billion parameter Qwen2.5-7B using the framework. During training, the Conductor was tasked with designing agentic workflows of up to five steps. It had access to a worker pool containing seven different models: three closed-source giants (Gemini 2.5 Pro, Claude-Sonnet-4, and GPT-5) and four open-source models (including DeepSeek-R1-Distill-Qwen-32B, Gemma3-27B, and Qwen3-32B).

The team evaluated the Conductor across a variety of highly challenging benchmarks, comparing it against individual frontier models acting alone, self-reflection agents prompted iteratively to improve their own answers, and state-of-the-art multi-agent routing frameworks like MASRouter, Mixture-of-Agents (MoA), RouterDC, and Smoothie. The small 7B Conductor established new benchmarks across the board. It achieved an average score of 77.27% across all tasks, hitting 93.3% on the AIME25 math benchmark, 87.5% on GPQA-Diamond, and 83.93% on LiveCodeBench, according to the researchers.

Remarkably, it achieved these results while maintaining high efficiency. While baseline models like MoA consumed 11,203 tokens per question, the Conductor used an average of just 1,820 tokens, requiring an average of only three steps per workflow.

Sakana's 7B Model Orchestrates GPT, Claude, and Gemini LLMs 4

A closer examination of the experimental details reveals precisely why the framework is so effective. The Conductor automatically learned to gauge task difficulty. For simple factual recall questions, it often resolved the problem in a single step or employed a basic two-agent setup. However, for complex coding challenges, it constructed extensive workflows involving up to four agents with dedicated planning, implementation, and verification phases.

The Conductor also recognized that frontier models possess different strengths. To achieve record scores on coding benchmarks, the Conductor frequently assigned Gemini 2.5 Pro and Claude Sonnet 4 to act as high-level planners, and only involved GPT-5 at the very end to write the final optimized code. In a particularly astute display of adaptability, the Conductor would sometimes fully relinquish its own role, delegating the entire planning process to Gemini 2.5 Pro and allowing it to dictate the subtasks for the rest of the pool.

Beyond math and coding benchmarks, Sakana AI is already leveraging the underlying architecture in practical enterprise applications. “We have been using our Fugu models based on the Conductor technology internally for various practical enterprise applications: software development, deep research, strategy development, and even visual tasks like slide generation,” Tang stated.

Bringing orchestration to the enterprise: Sakana Fugu

While the 7B model detailed in the research paper served as an exploratory blueprint and is not publicly available, Sakana AI has productized the Conductor framework into its flagship commercial AI product, Sakana Fugu. Currently in its beta phase, Fugu functions as a multi-agent orchestration system accessible through a standard OpenAI-compatible API.

Tang indicated that Fugu targets “the large market of industries where AI adoption has yet to yield significant productivity gains due to the generalization limitations of current hard-coded pipelines, such as finance and defense.”

For enterprise developers, this enables seamless integration into existing applications without the complexity of managing multiple API keys or manually routing tasks across different vendors. Beneath the API interface, Fugu automates intricate collaboration topologies and role assignments across a pool of models. To accommodate varying business needs, Sakana released two variants: Fugu Mini, optimized for low-latency operations, and Fugu Ultra, designed for maximum performance on demanding workloads.

Addressing governance concerns regarding autonomous agents initiating invisible workflows, Tang pointed out that the interpretability risks are functionally similar to the hidden reasoning traces of current top-tier closed APIs, and the system is managed with established guardrails to minimize hallucinations.

For enterprise architects evaluating whether to deploy RL-orchestration versus traditional routing, the decision often hinges on engineering resources. “We believe the absolute sweet spot arises whenever users and their teams feel they are spending a disproportionate amount of time guiding their underlying agents,” Tang explained. However, he cautioned that the framework is not necessary for all applications, noting that “it’s hard to beat the economic proposition of a local model running directly on the user’s machine for simple queries.”

As the ecosystem of specialized open- and closed-source AI models continues to expand, static hardcoded pipelines will inevitably become obsolete. Looking forward, this dynamic orchestration is likely to extend beyond text and code environments. “There is indeed a large potential to fill this gap with cross-modal Conductor frameworks becoming the foundation for more autonomous, self-coordinating physical AI systems,” Tang concluded.

Business Style Takeaway: Sakana AI’s RL Conductor represents a significant advancement in AI orchestration, moving beyond static, developer-defined workflows to dynamic, self-optimizing systems. This innovation offers enterprises a path to unlock greater productivity from AI by enabling models to intelligently route tasks to specialized agents, promising higher performance and efficiency for complex, heterogeneous workloads.

Details can be found on the website : venturebeat.com

No votes yet.
Please wait...

Leave a Reply

Your email address will not be published. Required fields are marked *