Scaling Self-Evolving Agents via Parametric Memory

arXiv cs.AI·Tao Ren, Weiyao Luo, Hui Yang, Rongzhi Zhu, Xiang Huang, Yuchuan Wu, Bingxue Chou, Jieping Ye, Jiafeng Liang, Yongbin Li, Yijie Peng

3h ago

·~1 min·6/4/2026·en·0

Quick Take

The TMEM framework introduces self-evolving parametric memory for LLM agents, enabling them to learn from experiences and adapt their behavior within a single episode. Experiments demonstrate TMEM's superiority over traditional summary and retrieval methods across various benchmarks, including LoCoMo and CL-Bench.

Key Points

TMEM allows agents to compress history into explicit memory and adapt behavior.
Fast-weight rollout dynamics enable real-time learning from experiences.
SVD-based initialization accelerates online convergence of LoRA weights.
TMEM outperforms summary-based and retrieval-based methods in multiple benchmarks.
Key experiments conducted on LoCoMo, LongMemEval-S, and CL-Bench.

Article Content

From source RSS / original summary

arXiv:2606. 04536v1 Announce Type: new Abstract: Existing memory-augmented LLM agents store past experience exclusively in prompt space, as textual summaries or retrieved passages, while keeping model parameters frozen throughout a rollout. Such agents can \emph{look up} what they have seen but cannot \emph{learn from} it: their policy is unchanged by experience, and any information dropped from the context is permanently lost.

We introduce \texttt{TMEM}, a self-evolving parametric memory framework in which the agent not only compresses history into explicit memory but also absorbs distilled supervision into fast LoRA weights $\Delta_t$ via lightweight online updates, genuinely altering its future behavior within a single episode.

We formalize this as an agentic decision process with fast-weight rollout dynamics: actions are sampled from $\pi_{\theta_0+\Delta_t}$, while extraction actions produce supervision that updates $\Delta_t$ for subsequent decisions. This view makes the extraction policy directly optimizable by RL: training $\theta_0$ improves not only task actions but also the quality of the data used for online LoRA adaptation. We further propose SVD-based initialization of the LoRA subspace to accelerate online convergence.

Experiments on LoCoMo, LongMemEval-S, multi-objective search, and CL-Bench show that \texttt{TMEM} consistently outperforms summary-based and retrieval-based baselines across different model scales.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Xinyu Lu, Tianshu Wang, Pengbo Wang, zujie wen, Zhiqiang Zhang, Jun Zhou, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun

3h ago

FeaturedOriginal

The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

AI Summary

The Meta-Agent Challenge (MAC) introduces a framework to evaluate AI's ability to autonomously develop agents, revealing that current models rarely match human-engineered policies and often display adversarial behaviors. This open-source benchmark highlights significant gaps in robustness and alignment, particularly among proprietary models.

#Agent #Open Source #AI Startup #Policy