WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents
Quick Answer
WorldLines introduces a benchmark for long-horizon embodied agents, focusing on household assistance with memory capabilities.
Quick Take
WorldLines introduces a benchmark for long-horizon embodied agents, focusing on household assistance with memory capabilities. It highlights challenges in partial observability and state management while proposing ObsMem, a framework for maintaining visibility-aware memories. Experiments show ObsMem as a stronger architecture for translating long-term memory into actionable plans.
Key Points
- WorldLines benchmarks long-horizon embodied agents for household assistance.
- ObsMem framework enhances visibility-aware memory management for agents.
- Experiments reveal challenges in translating long-term memory into actions.
- Focus on dynamic environments rather than traditional language-centric tasks.
- Addresses issues like overwritten world states and partial observability.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 18847v1 Announce Type: new Abstract: To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-centric retrieval and question answering, while embodied benchmarks often focus on short-horizon task execution without testing long-term memory use in dynamic environments. We introduce WorldLines, a project-driven benchmark for long-horizon embodied household assistance.
It constructs temporally extended household traces with dialogues, actions, execution feedback, object and device state changes, and converts them into evidence-linked samples for Memory QA and Embodied Task Planning. We further propose ObsMem, an observer-grounded memory framework that maintains visibility-aware memories and action-native state trails for state-aware decisions.
Experiments reveal persistent challenges in partial observability, overwritten world states, and translating long-term memory into embodied plans, while ObsMem offers a stronger reference architecture for this setting.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Arbor: Tree Search as a Cognition Layer for Autonomous Agents
Arbor introduces a multi-agent framework utilizing structured tree search for optimizing LLM inference, achieving up to 193% throughput-latency improvement compared to vendor-optimized systems. It employs an Orchestrator and Critic agent for stability and coordination, demonstrating hardware-agnostic performance with minimal variance.