Agentic Memory Breakthrough: 118K Tokens Per Query Outperforms 3.26M LangMem

Long-horizon reasoning presents a significant hurdle for AI agents, primarily due to rapidly filling context windows and the tendency for retrieval pipelines to return irrelevant data rather than actionable insights.

To address this challenge, researchers at the National University of Singapore have introduced MRAgent, a novel framework that departs from the conventional static “retrieve-then-reason” paradigm. Instead, MRAgent employs a dynamic mechanism that enables an agent to develop its memory incrementally based on accumulating evidence. This multi-step memory reconstruction process is seamlessly integrated into the reasoning capabilities of large language models (LLMs). While not the sole solution in this evolving field, MRAgent demonstrably reduces token consumption and operational costs compared to existing agent memory management approaches.

The Limitations of Passive Retrieval in Complex Tasks

Traditional retrieval pipelines typically involve fetching documents via vector search or graph traversal, which are then fed into an LLM for analysis. This passive methodology falls short in long-horizon scenarios, as it struggles to synergize reasoning with memory access, leading to three key bottlenecks:

These systems lack the ability to adapt their retrieval strategy mid-reasoning. If an agent retrieves a document and identifies a critical missing piece of information—such as a specific date or individual—it cannot initiate a new query informed by this discovery.
Fixed similarity scores and predetermined graph expansions often result in the retrieval of surface-level matches. This floods the LLM’s context window with extraneous information, thereby degrading the quality of reasoning.
Current systems heavily rely on pre-established structures like top-k results and static relevance functions, limiting the agility required to manage unpredictable, long-term user interactions effectively.

The researchers advocate for a shift towards an “active and associative reconstruction process,” drawing inspiration from cognitive neuroscience, to overcome these limitations.

Agentic Memory Breakthrough: 118K Tokens Per Query Outperforms 3.26M LangMem 4

In this revised approach, memory recall is a sequential process, rather than a static retrieval from a database. The system begins with precise triggers from the user’s prompt, such as names, actions, or locations. These initial cues guide the agent toward related concepts or categories, rather than directly accessing large volumes of text. By navigating these “metadata stepping stones,” the agent gathers small pieces of evidence iteratively, using each new insight to inform the next step until the complete and accurate context is established.

MRAgent’s Implementation of Active Memory Reconstruction

MRAgent conceptualizes memory not as a static repository, but as an interactive environment. When presented with a complex query, the agent leverages the LLM’s inherent reasoning capabilities to explore multiple potential retrieval pathways within a structured memory graph. At each stage, the LLM assesses the evidence gathered so far, using it to refine its search iteratively. This involves inferring new search constraints, pursuing the most promising information paths, and discarding irrelevant branches, thereby allowing MRAgent to uncover deeply embedded information without overwhelming the LLM’s context window.

Agentic Memory Breakthrough: 118K Tokens Per Query Outperforms 3.26M LangMem 5

To ensure computational efficiency and scalability, the framework organizes its data using a “Cue-Tag-Content” mechanism, structured as a multi-layered associative graph with three distinct node types:

Cues: These are granular keywords, such as entities or contextual attributes, extracted directly from user interactions.
Content: This represents the actual stored memory units, segmented into layers based on granularity. This includes episodic memory for specific events and semantic memory for stable facts and user preferences.
Tags: These function as semantic bridges, summarizing the relational associations between specific Cues and Content nodes.

This architecture facilitates a highly efficient, two-stage retrieval process. Initially, the LLM navigates from Cues to candidate Tags. Because Tags explicitly detail semantic relationships and data structures, the agent can evaluate these concise summaries for relevance. The LLM identifies promising pathways and discards irrelevant branches before accessing the detailed, resource-intensive memory contents. This prevents unnecessary computation and token usage.

Consider this user query: “How did Nate use the prize money when he won his third video game tournament?”

MRAgent first identifies key cues from the prompt: “Nate,” “video game tournament,” and “win.”
The agent maps these cues to the memory graph, examining associated Tags. It identifies tags like “Tournament Victory” and “Tournament Participation.” Since the query pertains to post-victory actions, MRAgent prioritizes the “Victory” tag, discarding “Participation.”
The agent then retrieves episodic content linked to the selected Cue-Tag pair, uncovering three distinct memory episodes of Nate winning tournaments.
MRAgent analyzes these three memories, selects the most relevant one, and discards the others. It then updates its cues with information from the retrieved memory, such as “tournament earnings,” using this to explore new tags and identify further relevant memories. This iterative process continues until sufficient information is gathered to answer the query, for instance, “Nate saved the money.”

VB Transform · July 14–15 · Menlo Park · Agentic orchestration

Intuit rebuilt its multi-agent system in 60 days. What did they change — and why?

At Transform, engineering leaders from Intuit, Target, and Instacart break down how they redesigned their orchestration architectures for reliability, scale, and real customers.

See the full agenda →

MRAgent Performance on Industry Benchmarks

MRAgent competes with several other frameworks designed for building agentic memory systems, including A-MEM (a graph-based framework), MemoryOS (a hierarchical memory framework), LangMem, and Mem0 (persistent memory solutions).

The researchers evaluated MRAgent using the LoCoMo and LongMemEval benchmarks, which assess the ability of agents to handle long-horizon tasks and complex conversational data across numerous sessions and dialogue turns. Utilizing Gemini 2.5 Flash and Claude Sonnet 4.5 as backbone models, MRAgent was benchmarked against standard RAG (Retrieval-Augmented Generation), A-MEM, MemoryOS, LangMem, and Mem0.

MRAgent consistently surpassed all baseline methods across both LLM models and various question types by a significant margin. For enterprise developers, however, computational efficiency is a paramount concern. In the LongMemEval tests, MRAgent reduced prompt token consumption to an average of 118k per sample, a dramatic improvement over A-Mem’s 632k tokens and LangMem’s 3.26 million tokens per query. Furthermore, MRAgent nearly halved the processing runtime compared to A-Mem, reducing it from 1,122 seconds to 586 seconds.

Agentic Memory Breakthrough: 118K Tokens Per Query Outperforms 3.26M LangMem 6

MRAgent’s practical efficiency stems from its on-demand operation. By evaluating tags and pruning irrelevant paths prior to full data retrieval, the system conserves computational resources and valuable context space. Moreover, the agent autonomously assesses its accumulated context, determining precisely when to cease searching and thereby preventing redundant data exploration.

Implementation Considerations and Development Approach

While MRAgent demonstrates superior performance, the Cue-Tag-Content structure requires pre-configuration before an agent can query it. Developers must architect the underlying memory database to facilitate efficient navigation of associative items and effective pruning of irrelevant paths without incurring prohibitive computational costs.

Fortunately, manual data labeling or structuring is not necessary. The MRAgent framework includes an automated distillation pipeline that leverages LLMs to process raw interaction histories and automatically populate the memory graph. Developers are tasked with implementing and orchestrating this automated ingestion pipeline, rather than manual data tagging.

This involves setting up a background or streaming pipeline to feed raw user interactions through prompt templates for metadata extraction, which is then stored in a graph database. The authors emphasize that this ingestion process is designed to be straightforward, keeping the setup phase manageable.

The research code has been made publicly available on GitHub.

Business Style Takeaway: MRAgent’s innovative approach to agentic memory management significantly enhances AI’s ability to handle complex, long-duration tasks by optimizing information retrieval and reducing computational overhead. This development is crucial for enterprises seeking to deploy more efficient and cost-effective AI agents in customer service, data analysis, and complex problem-solving scenarios.

Original article : venturebeat.com

No votes yet.

Please wait...

The Limitations of Passive Retrieval in Complex Tasks

MRAgent’s Implementation of Active Memory Reconstruction

MRAgent Performance on Industry Benchmarks

Implementation Considerations and Development Approach

Leave a ReplyCancel Reply