PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft
Quick Take
PEAM introduces a novel memory framework for Minecraft agents, combining a slow deliberative LLM with a fast parametric module, enhancing task performance and mitigating forgetting. This approach utilizes a unique self-triggered consolidation mechanism and treats failure as a training signal, leading to improved efficiency over traditional retrieval-based methods.
Key Points
- PEAM uses a multimodal Mixture-of-Experts LoRA architecture for continual learning.
- The framework improves long-horizon task performance in Minecraft experiments.
- Failure-correction pairs are internalized through behavioral-cloning and contrastive objectives.
- PEAM's self-triggered consolidation mechanism adapts across task distributions.
- It enhances parametric efficiency compared to retrieval-based embodied agents.
Article Content
From source RSS / original summaryarXiv:2605. 27762v1 Announce Type: new Abstract: We present PEAM, a Parametric Embodied Agent Memory framework in Minecraft that transforms agent memory from inference-time retrieval into parameter-resident skills internalized through experience. PEAM pairs a slow deliberative LLM for open-ended reasoning with a fast parametric module for reflexive execution of consolidated skills.
The fast module is a multimodal Mixture-of-Experts LoRA architecture with per-category physically isolated adapters, enabling parameter-level continual learning without catastrophic forgetting. We treat failure as a first-class training signal: failure--correction trajectory pairs are internalized through a joint behavioral-cloning and contrastive objective, so the agent learns not only what succeeds but also how corrected actions differ from failed ones.
To govern consolidation, PEAM introduces a parameterization-worthiness score for deciding which experience should be internalized, and a scale-free self-triggered consolidation mechanism for deciding when to internalize without task-specific hand-tuned thresholds, making the agent self-evolving as the trigger transfers across task distributions without re-tuning.
Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting on previously consolidated skills, and improves parametric-versus-retrieval efficiency over retrieval-based embodied agents and parametric memory variants.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane
The Redpanda Agentic Data Plane (ADP) introduces out-of-band metadata channels to enhance the safety of autonomous AI agents, ensuring secure data access and tamper-proof audit trails. This architecture mitigates risks associated with unpredictable AI behavior by enforcing governance throughout the agent lifecycle, demonstrated in a multi-agent trading system with strict data scoping and approval thresholds.
