
Elastic Open-Sources Atlas Agent Memory Based on Cognitive Science
Quick Answer
Elastic has open-sourced Atlas, a memory system for agents built on Elasticsearch, achieving a 0.89 Recall@10 in question-answering.
Quick Take
Elastic has open-sourced Atlas, a memory system for agents built on Elasticsearch, achieving a 0.89 Recall@10 in question-answering. Atlas maintains three types of memory—episodic, semantic, and procedural—ensuring user-specific context without overloading the LLM prompt, addressing scalability issues in long-term interactions.
Key Points
- Atlas integrates with agents via and maintains per-user memory isolation.
- Each memory type is stored in separate Elasticsearch indices for lifecycle management.
- Consolidation of memories updates procedural memory and creates new playbooks.
- Agents query memories using a hybrid approach combining BM25 and Jina v5.
- Atlas source code is available on GitHub for further exploration.
📖 Reader Mode
~3 min readElastic open-sourced Atlas, a system built on Elasticsearch that maintains three categories of memory for agents. Atlas integrates with agents via MCP and maintains per-user isolation of memories. When evaluated on question-answering capability, it scored 0.89 Recall@10.
Atlas is a solution to the problem of identifying the proper context data to add to an agent's LLM prompt when dealing with users that have a long history of interacting with the agent. Loading the entire interaction history isn't a scalable solution, according to Elastic:
The standard workaround is to stuff prior context into the context window. That breaks down on cost, on latency, and on the well-documented "lost in the middle" effect, where models ignore facts placed far from the prompt's edges. A 1M-token context window is a scratchpad. It is not a memory system...What is missing is long-term memory: a persistent store that survives session end, scales to years of interaction, and lets you retrieve facts by content, by time, and by user.
The key concept in Atlas is that there are three types of memory identified by cognitive science: episodic, which captures "what happened;" semantic, "what's true;" and procedural, "what works." Atlas maintains separate Elasticsearch indices for each type of memory, since each type has its own rules and lifecycle.
Memories are created by storing each user input as an episodic memory event. These mostly decay out of memory, although some "become evidence for durable facts." This is done by asking an LLM to consolidate them. The LLM will identify new facts or semantic memories and store each as a short sentence, along with supporting episodic memories as evidence, as well as any previous facts that the new fact supersedes.
Consolidation also updates procedural memory in two ways. First, by creating new "playbooks," which are a series of steps to solve a problem. It also updates success and failure counters for existing playbooks. These counts can bias the retrieval results to boost playbooks that are more successful.
Agents access memories via a single hybrid query across all these indices that uses Reciprocal Rank Fusion (RRF) over BM25 lexical search plus Jina v5 semantic search; the merged results are re-ranked using a cross-encoder reranker. Document-level security (DLS) ensures that queries only search memory documents belonging to that user.
In a discussion about Atlas on Hacker News, some users wondered if using Elasticsearch as the storage was "overkill," and suggested other vector-capable databases such as SQLite. Another user replied:
"Any other vector DB" starts to fall apart once you need stuff like scripted scoring... Then it starts to be a question of, "do you need [Approximate Nearest Neighbor] for performance?"...And granted, brute-force is performant for far more vectors than most people give it credit for, but it definitely hits a wall well below 1 million if you want it to have webpage-type latency. Maintaining Elasticsearch isn't free, but picking an underpowered db and having to port to the right one is also quite time consuming.
The Atlas source code is available on GitHub.
About the Author
Anthony Alford
Show moreShow less
— Originally published at infoq.com
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from InfoQ AI, ML & Data Engineering
See more →
Google OpenRL is an Experimental Self-hosted API for LLM Post-Training Fine-tuning
Google's GKE Labs has launched OpenRL, an open-source self-hosted API designed for fine-tuning Large Language Models (LLMs) on Kubernetes clusters. This initiative aims to streamline post-training processes, making it easier for developers to enhance LLM performance without relying on external services.

