SaliMory: Orchestrating Cognitive Memory for Conversational Agents
Quick Take
SALIMORY is a novel framework that enhances conversational agents' memory management, achieving over 10% improvement in end-to-end accuracy and reducing memory-related failures by one-third. It employs a hierarchical reward system for distinct memory operations, significantly improving user personalization rates.
Key Points
- SALIMORY trains a single language model for structured memory management.
- It reduces memory-attributed failures by one-third.
- The framework outperforms state-of-the-art models by over 10% in accuracy.
- Improvements in user personalization rates are more than double.
- Hierarchical reward system supports distinct memory operations end-to-end.
Article Excerpt
From source RSS / original summaryarXiv:2606. 04120v1 Announce Type: new Abstract: Conversational agents that serve as lifelong companions must maintain persistent memory across all interactions. However, simply expanding context windows with raw retrieval degrades reasoning quality, while training memory agents via standard reinforcement learning creates a severe credit assignment bottleneck in a multi-stage pipeline.
To solve this, we introduce SALIMORY, a framework that trains a single language model to manage a cognitively-structured memory-spanning user facts, preferences, and working memory. By introducing a hierarchical stage-wise process reward and reward-decomposed contrastive refinement, SALIMORY provides isolated supervision for distinct memory operations (selective filtering, consolidation, and cue-driven recall) end-to-end.
SALIMORY cuts memory-attributed failures by one-third, outperforms the state-of-the-art by over 10% in end-to-end accuracy, and more than doubles the Good Personalization rate.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.