Learning Agent-Compatible Context Management for Long-Horizon Tasks

arXiv cs.AI·Lu Yi, Runlin Lei, Liuyi Yao, Yuexiang Xie, Yuyang Li, Wenhao Zhang, Zhewei Wei, Yaliang Li, Jian-Yun Nie

4h ago

·~1 min·6/1/2026·en·0

Quick Take

Adaptive Context Management (AdaCoM) enhances long-horizon task performance for LLM agents by managing context through an external LLM, significantly improving results on web search and deep research benchmarks. It reveals a Fidelity-Reliability Trade-off, where higher-performing agents benefit from context preservation, while lower-performing agents need aggressive compression to maintain reasoning reliability.

Key Points

AdaCoM uses end-to-end reinforcement learning for context management in LLM agents.
Significant performance improvements observed on benchmarks for web search and deep research tasks.
Fidelity-Reliability Trade-off indicates varying context management needs based on agent performance.
Transfer experiments show AdaCoM generalizes well across agents with similar capabilities.
Practical for closed-source agents, avoiding the need for agent-specific training.

Article Content

From source RSS / original summary

arXiv:2605. 30785v1 Announce Type: new Abstract: LLM agents increasingly face long-horizon tasks such as web search and deep research in real-world applications, where accumulated context can cause long-context degradation and reasoning failures.

Prior work mitigates this through context management with agent-side context control or fixed strategies such as summarization, which require training the agent itself for adaptation - making it impractical for closed-source agents and ignoring that different agents may require different strategies. We introduce Adaptive Context Management (AdaCoM), which trains an external LLM to manage the context of a frozen agent through flexible modification actions and end-to-end reinforcement learning.

Across diverse agents on web search and deep research benchmarks, AdaCoM substantially improves performance by preserving task constraints and progress while pruning stale content. The learned strategies reveal a Fidelity-Reliability Trade-off: agents with higher vanilla ReAct performance benefit from higher-fidelity context preservation, whereas lower-performing agents require more aggressive compression to stay within a reliable reasoning regime.

Transfer experiments show that AdaCoM generalizes most effectively across agents with similar capability (measured by vanilla ReAct performance), suggesting a practical path toward reusable context managers for agent systems.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Tyler Akidau, Tyler Rockwood, Johannes Br\"uderl, Marc Millstone

3d ago

FeaturedOriginal

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

AI Summary

The Redpanda Agentic Data Plane (ADP) introduces out-of-band metadata channels to enhance the safety of autonomous AI agents, ensuring secure data access and tamper-proof audit trails. This architecture mitigates risks associated with unpredictable AI behavior by enforcing governance throughout the agent lifecycle, demonstrated in a multi-agent trading system with strict data scoping and approval thresholds.

#Agent #Robotics #Security #Policy