Learning Agent-Compatible Context Management for Long-Horizon Tasks
Quick Take
Adaptive Context Management (AdaCoM) enhances long-horizon task performance for LLM agents by managing context through an external LLM, significantly improving results on web search and deep research benchmarks. It reveals a Fidelity-Reliability Trade-off, where higher-performing agents benefit from context preservation, while lower-performing agents need aggressive compression to maintain reasoning reliability.
Key Points
- AdaCoM uses end-to-end reinforcement learning for context management in LLM agents.
- Significant performance improvements observed on benchmarks for web search and deep research tasks.
- Fidelity-Reliability Trade-off indicates varying context management needs based on agent performance.
- Transfer experiments show AdaCoM generalizes well across agents with similar capabilities.
- Practical for closed-source agents, avoiding the need for agent-specific training.
Article Content
From source RSS / original summaryarXiv:2605. 30785v1 Announce Type: new Abstract: LLM agents increasingly face long-horizon tasks such as web search and deep research in real-world applications, where accumulated context can cause long-context degradation and reasoning failures.
Prior work mitigates this through context management with agent-side context control or fixed strategies such as summarization, which require training the agent itself for adaptation - making it impractical for closed-source agents and ignoring that different agents may require different strategies. We introduce Adaptive Context Management (AdaCoM), which trains an external LLM to manage the context of a frozen agent through flexible modification actions and end-to-end reinforcement learning.
Across diverse agents on web search and deep research benchmarks, AdaCoM substantially improves performance by preserving task constraints and progress while pruning stale content. The learned strategies reveal a Fidelity-Reliability Trade-off: agents with higher vanilla ReAct performance benefit from higher-fidelity context preservation, whereas lower-performing agents require more aggressive compression to stay within a reliable reasoning regime.
Transfer experiments show that AdaCoM generalizes most effectively across agents with similar capability (measured by vanilla ReAct performance), suggesting a practical path toward reusable context managers for agent systems.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane
The Redpanda Agentic Data Plane (ADP) introduces out-of-band metadata channels to enhance the safety of autonomous AI agents, ensuring secure data access and tamper-proof audit trails. This architecture mitigates risks associated with unpredictable AI behavior by enforcing governance throughout the agent lifecycle, demonstrated in a multi-agent trading system with strict data scoping and approval thresholds.