Beyond Compaction: Structured Context Eviction for Long-Horizon Agents
Quick Answer
This paper shows that The Context Window Lifecycle (CWL) enhances long-horizon LLM agents by enabling effective context management through semantically-aware eviction, allowing agents to complete 89 sequential tasks over 80 million tokens without accuracy loss.
Quick Take
The Context Window Lifecycle (CWL) enhances long-horizon LLM agents by enabling effective context management through semantically-aware eviction, allowing agents to complete 89 sequential tasks over 80 million tokens without accuracy loss. This approach circumvents issues of traditional summarization and recency truncation, maintaining relevant context while shedding less critical information.
Key Points
- CWL allows long-horizon agents to manage context effectively with a deterministic eviction policy.
- The method maintains user turns and active reasoning context while shedding less critical episodes.
- CWL completed 89 tasks across 80 million tokens with no measurable accuracy degradation.
- It avoids common issues like unpredictable lossiness and destruction of causal structure.
- CWL is semantically aware, prioritizing content based on dependency rather than recency.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 11213v1 Announce Type: new Abstract: We present Context Window Lifecycle (CWL), a context-management scheme that gives long-horizon LLM agents an effectively unbounded working horizon.
As a session accumulates history, CWL keeps the context within budget through graduated, semantically-aware eviction: the agent annotates its trajectory as typed, dependency-linked episodes as work proceeds, and a deterministic, LLM-free policy evicts content in priority order within that structure when a token budget is exceeded.
CWL preserves user turns and the exploratory context the agent is actively reasoning over, while aggressively shedding action episodes whose effects are already persisted in the environment, keeping active context near a stable ceiling that also avoids the performance degradation associated with very large prompts. Compared to summarization-based compaction, CWL avoids four well-known limitations: unpredictable lossiness, destruction of causal structure, blocking model cost, and compression-induced hallucination.
Compared to recency truncation, CWL is semantically aware: it drops the oldest-and-most-recoverable content according to the dependency graph rather than oldest-in-time regardless of relevance. We describe the annotation protocol, the episode graph, the eviction policy, and the token-accounting loop, and evaluate CWL on long-horizon agentic benchmarks: a single agent session completing 89 sequential tasks across 80 million tokens with no measurable degradation in task accuracy relative to per-task isolated sessions
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.