Resolve AI Tackles AI Coding Boom Breaking Production Systems

Resolve AI, the production-operations startup backed by Greylock and Lightspeed Venture Partners, has unveiled a significant platform expansion. This upgrade introduces persistent background agents, a revamped investigation framework, and a collaborative workspace where engineers and AI agents work together in real-time on live incidents.

A Multi-Agent Approach to Incident Resolution

At the core of this release is a new multi-agent investigation system developed internally. Unlike previous iterations that deployed a single AI agent to diagnose production failures—akin to a lone engineer on call—the platform now deploys a coordinated squad of specialized agents. These agents simultaneously pursue multiple hypotheses, independently validate each other’s findings, and construct comprehensive causal chains from root cause to symptom. Resolve AI asserts this new architecture achieves more than a twofold increase in root cause accuracy based on internal evaluations compared to earlier versions.

“Think of a single agent being on call, the way a human would be,” Resolve AI CEO and co-founder Spiros Xanthos explained in an exclusive interview. “We now have a team of agents that all work together, almost like a team of humans debugging an issue, and that has improved quality by 2x.”

This announcement comes at a critical juncture for the software industry. The proliferation of AI-powered code generation has dramatically accelerated software development velocity. However, maintaining these complex systems in production—debugging, monitoring, and auditing—remains largely manual. Resolve AI, having recently secured $125 million in Series A funding at a $1 billion valuation, is positioning itself as a leader in addressing the operational challenges of the software lifecycle, identifying it as the next major area for AI investment.

Evaluating Accuracy Claims Through Real-World Scenarios

Startups’ accuracy claims often invite scrutiny, and Xanthos was forthcoming about the scope and limitations of their evaluation methodology. The stated 2x improvement stems from internal benchmarks, not third-party audits. However, the evaluation set was meticulously constructed to mirror the complexity encountered daily by Resolve AI’s enterprise clientele.

“These are very hard, complex evaluations that we built over time to represent real-world examples,” Xanthos elaborated. “This is not customer data, but these evaluations represent difficult cases similar to what we’ve seen at some of the largest tech companies we work with.” He characterized the dataset as comprising hundreds of cases reflecting production failures common at companies such as Coinbase, Salesforce, DoorDash, and Zscaler, all of whom are Resolve AI customers.

The practical implications of this enhanced accuracy are substantial. Resolve AI’s agents now serve as the initial responders for every on-call alert, typically completing their triage within five minutes, often before human engineers are even engaged. Previous company disclosures have highlighted DoorDash’s reduction in time to root cause by up to 87 percent. When asked to provide context for this figure, Xanthos described the typical baseline scenario: “When something goes wrong, it might take five to 10 minutes for a human to even get their laptop and connect,” he stated. “The typical MTTR [Mean Time To Resolution] is in the tens of minutes, sometimes hours, depending on severity. So an improvement of 80-plus percent—four to five times faster—is actually huge. It’s something we’ve never achieved before with AI, tools, data, or observability.”

Cross-Verification Among AI Agents Prevents Hallucinated Root Causes

A significant challenge in applying large language models (LLMs) to critical production environments is their propensity to generate plausible but incorrect responses. In the context of a live outage, this could lead engineering teams down the wrong path, prolonging downtime.

Xanthos directly addressed this concern: “This is a very common issue with models out of the box,” he noted. “They always try to give you an answer, and if they don’t have enough evidence, they’ll give you the best possible answer—which is likely to be wrong.”

Resolve AI’s solution involves a system of layered verification among its agents. Each agent tasked with investigating a hypothesis must cite all supporting evidence and submit it for independent review by another agent. The investigating agent must construct a complete causal chain, from the root cause to the observed symptom. Peer agents actively work to disprove these theories by identifying logical inconsistencies or gaps in evidence.

“Often, agents actually disprove those theories because they find gaps,” Xanthos remarked. “There are many layers of defense and agentic checks that allow Resolve to be very accurate and not mislead.”

Equally crucial, he emphasized, is the system’s ability to indicate when it lacks sufficient information. “The bar to actually saying ‘I have the answer’ is very high. In those cases, it will say, ‘This is the evidence I found. Here are three or four paths you can take from here, but I wasn’t able to fully prove that this is the problem.’ A system like this that operates in production cannot be a black box.” In high-stakes operational contexts, calibrated uncertainty can be more valuable than confident but incorrect outputs. For an AI system integrated into incident response, providing misleading information during a customer-impacting outage could exacerbate the problem it aims to solve.

Introducing Persistent Background Agents

Beyond immediate incident response, Resolve AI is introducing a new category of background agents designed to manage the continuous, often unseen, operational tasks that engineering teams struggle to perform consistently at scale.

These agents operate on schedules or are triggered by specific events—such as a new deployment, a critical alert, or a merged pull request. They accumulate institutional knowledge from every investigation and human interaction over time. When an engineer accesses the Resolve AI interface, these agents have already been proactively investigating priority issues, monitoring deployments, auditing alert configurations, identifying configuration drift, and flagging cost anomalies.

Xanthos differentiated these background agents from the incident-response agents that have been Resolve AI’s primary offering. “You can now have these agents run in the background at all times—not only when a human asks an agent to debug a problem or when an alert fires,” he stated. “A lot of our customers are now monitoring changes that land in production before they cause an issue. There’s an agent that monitors those all the time.”

He described these background agents as “general-purpose SRE [Site Reliability Engineering] agents available to every developer,” capable of handling a range of tasks from monitoring infrastructure changes that might impact cloud costs to conducting post-incident follow-up, such as generating code fixes based on incident learnings. This addresses a fundamental challenge in software operations: the daily tasks essential for maintaining healthy production systems—monitoring deployments, investigating alerts, tracking changes across complex environments—are critical but often reactive and manual. Engineering organizations recognize the importance of this work, but it frequently competes with feature development priorities. Automated agents performing these tasks continuously can shift teams from reactive problem-solving to proactive operational management.

A Collaborative Workspace for Engineers and AI Agents

The third major component of this release is the “shared investigation surface”—a workspace where engineers and AI agents can interact with the same live evidence during an active incident. Reports are updated dynamically as investigations progress, with every finding being inspectable. Engineers can conduct parallel investigations without disrupting the primary workflow. Source queries can be accessed and modified in place, evidence is embedded directly within the workspace, and remediation actions can be initiated from the same interface, eliminating the need to switch between tools.

“Think of it as an interface to all the production tools, but also an interface where humans and agents can collaborate with each other—or agents with agents,” Xanthos commented. “That’s what gradually leads to more trust and more automation, because you work with the agent, you teach it, you see the results.”

The company is also making its platform accessible via a REST API and an MCP (Model Context Protocol) server. This enables engineering teams to integrate Resolve AI into broader agentic workflows and infrastructure. According to Xanthos, this integration is already occurring in practice: “A general-purpose agent that a company has built—when it comes to debugging, that agent could invoke Resolve,” he said. “Or somebody works on their coding agent on the laptop, and Resolve shows up there as an MCP. If there is some production-related activity, the coding agent can invoke it.” This interoperability strategy signifies Resolve AI’s positioning not as an isolated system, but as a specialized component within a larger ecosystem of AI agents that will increasingly hand off tasks to one another—a model Xanthos likened to the open architecture of the web rather than the closed-garden approach of an app store.

Resolve AI’s Competitive Edge Against Incumbents

The landscape for AI-driven operations has become increasingly competitive over the past year, with major players like Datadog, PagerDuty, and cloud providers introducing their own AI-augmented operational capabilities. When asked to differentiate Resolve AI from these established entities, Xanthos highlighted the company’s foundational technical depth.

“We’re operating at the frontier here. There’s no blueprint for how you build a system like Resolve,” he stated. He noted that he and co-founder Mayank Agarwal were instrumental in creating OpenTelemetry, the widely adopted open-source project in observability, which now serves as the de facto standard for collecting telemetry data from modern software systems.

Xanthos also emphasized the recent establishment of the company’s AI Lab, led by a researcher formerly responsible for the post-training phase of Meta’s Llama models. “He managed to combine deep expertise of production observability with AI and models, and I think that’s very unique,” Xanthos commented. “I don’t believe any other company, whether it comes from an observability background or it’s a startup, has all of that together.”

The company’s structural advantages, according to Xanthos, include a comprehensive environment model developed for each customer, a memory system that learns within the specific production environment of each client, and its sophisticated multi-agent architecture. The AI Lab is currently focused on fine-tuning frontier models with production-specific data—the procedural knowledge that experienced engineers leverage for debugging, which is typically absent from standard model training datasets. This approach aligns with a growing trend among AI application companies: utilizing foundational frontier models as a base while investing heavily in domain-specific fine-tuning, retrieval mechanisms, and specialized agent architectures to achieve accuracy levels unattainable by general-purpose models alone.

Outcome-Based Pricing Redefines AI Economics in Production

Resolve AI’s pricing model deviates from traditional enterprise software licensing. The company employs a credit-based system, where credits are consumed as agents perform tasks, reflecting an outcome-based approach that directly links cost to delivered value.

“We’re not selling software,” Xanthos stated. “The way you buy and use Resolve is by buying credits that are consumed when Resolve performs an action. It’s outcome-based. Only when Resolve troubleshoots an alert—that’s the only time that it consumes credits.”

Addressing potential cost concerns, he argued that Resolve AI offers a more economical solution compared to building a similar system internally using cutting-edge models and MCP integrations. “If you were to take Opus or GPT-5.4 and try to build a solution like Resolve with MCPs, we measured—you actually end up consuming a lot more in tokens than what you have to pay Resolve, because our system is very optimized in terms of context, in terms of how it reads time-series data.”

Regarding the always-on background agents, Xanthos clarified that their persistent nature does not equate to continuous high-resource usage. “The background agent doesn’t mean it does intensive work all the time. It means that it can be there; you can give it any task you want. A lot of these tasks are triggered based on some action—an alert happens, somebody merges a PR, and you want to see if it has an impact on production.” For enterprise clients in regulated sectors—such as Coinbase and Zscaler—data residency and security are paramount. Resolve AI accommodates these requirements through a flexible deployment model: the data plane resides within the customer’s existing tool infrastructure, while the inference layer can be deployed as standard SaaS or within a customer-specific Virtual Private Cloud (VPC). “We designed Resolve to work with the large enterprises where security standards are the highest,” Xanthos affirmed. “There are many measures we take to ensure Resolve is secure, including not retaining data.”

Building Trust in AI Agents for Production Systems

A central cultural challenge accompanying the rise of AI is the question of whether engineering teams will entrust AI agents with autonomous actions in production environments—such as rolling back deployments, scaling resources, or generating code patches. Xanthos drew a parallel to the adoption of autonomous vehicles.

“For us to allow a car to drive on its own on the street, we have to prove that it’s safer than a human. Agents in production is a very similar concept,” he observed. While acknowledging that not all customers are immediately comfortable with automated actions, he described an evolving gradient of trust that is expected to advance rapidly.

“There is a set of actions that are relatively risk-free that most tech companies probably are comfortable having an agent take, and probably there is another set of actions for which the human has to approve,” he said. “But as quality keeps climbing the way we see at Resolve, I would say we’re going to cross the threshold this year where most of the actions will be taken by an agent automatically.”

He outlined the typical adoption trajectory: companies initially utilize agents for recommendations, with human oversight required for execution. Over time, trust is incrementally built. “I don’t think this is a problem where we just let the agents run wild from the beginning,” Xanthos stated. This phased approach mirrors historical enterprise technology adoption patterns, where organizations typically progress at a pace dictated by established trust, rather than solely by technological capability.

The Argument: AI-Generated Code Exacerbates Production Challenges

Perhaps the most striking element of Resolve AI’s core proposition is the assertion that the surge in AI-generated code is intensifying production-operations issues rather than alleviating them. In a recent LinkedIn post, Xanthos articulated this dynamic starkly, arguing that engineering leaders who prioritize faster code delivery without commensurate investment in production operations are essentially having their senior engineers “subsidize velocity” through an increased burden of incident response.

During his interview, he reiterated this point: “Now that coding agents are producing code, we produce a lot more code that we’re less familiar with—humans are less familiar with—so you need the AI to be the defense.”

This perspective positions Resolve AI not merely as a productivity enhancer but as a necessary counterbalance to the AI coding revolution. As organizations deploy more code, potentially written by tools their engineers don’t fully comprehend and operating within production systems they didn’t architect, the argument follows that operational complexity—and the consequences of failure—will escalate proportionally. On the Stack Overflow Podcast last October, Xanthos quantified this issue, estimating that engineers dedicate upwards of 70 percent of their time to maintaining and troubleshooting production systems, diverting focus from new feature development. “We’re facing a new crisis where we’re building faster than we can operate,” he stated during that discussion.

Resolve AI was founded in early 2024 by Xanthos and Agarwal, who first collaborated during their PhD programs at the University of Illinois and have worked together for over a decade. Xanthos previously co-founded Pattern Insight (acquired by VMware) and Omnition (acquired by Splunk), where he and Agarwal were instrumental in the creation of OpenTelemetry. The company raised a $35 million seed round from Greylock in 2024, followed by the $125 million Series A led by Lightspeed at a $1 billion valuation earlier this year. Notable customers include Coinbase, DoorDash, MSCI, Salesforce, MongoDB, and Zscaler.

Xanthos envisions a long-term future where AI capabilities surpass those of human software engineers, leading to the creation of significantly more technology and software. “It’s not actually fewer people working on it. It’s technology becoming cheaper, becoming more accessible, producing a lot more technology for the benefit of the world.”

While this ambitious vision will unfold over years, the immediate promise of today’s announcement addresses a visceral pain point for on-call engineers: the dreaded 2 a.m. alert, the scramble for a laptop, and the frantic search through dashboards and logs for an answer. Resolve AI is betting that by the time the next critical alert sounds, a team of AI agents will have already investigated, verified, and documented the root cause before the engineer’s phone even illuminates. For a profession historically defined by its Mean Time To Resolution, the crucial question is shifting from *if* AI can assist to *whether* engineers will fully embrace its capabilities.

Business Style Takeaway: Resolve AI’s expansion into multi-agent collaboration and persistent background operations signifies a pivotal shift towards proactive, AI-driven software lifecycle management. This development underscores the increasing imperative for businesses to automate operational complexities, directly addressing the productivity gains from AI code generation and mitigating associated risks in production environments.

Original article : venturebeat.com

No votes yet.

Please wait...