Beyond Compaction: Structured Context Eviction for Long-Horizon Agents

arXiv cs.CL·Andrew Semenov, Svyatoslav Dorofeev

2d ago

·~2 min·6/11/2026·en·0

Quick Answer

This paper shows that The Context Window Lifecycle (CWL) enhances long-horizon LLM agents by enabling effective context management through semantically-aware eviction, allowing agents to complete 89 sequential tasks over 80 million tokens without accuracy loss.

Quick Take

The Context Window Lifecycle (CWL) enhances long-horizon LLM agents by enabling effective context management through semantically-aware eviction, allowing agents to complete 89 sequential tasks over 80 million tokens without accuracy loss. This approach circumvents issues of traditional summarization and recency truncation, maintaining relevant context while shedding less critical information.

Key Points

CWL allows long-horizon agents to manage context effectively with a deterministic eviction policy.
The method maintains user turns and active reasoning context while shedding less critical episodes.
CWL completed 89 tasks across 80 million tokens with no measurable accuracy degradation.
It avoids common issues like unpredictable lossiness and destruction of causal structure.
CWL is semantically aware, prioritizing content based on dependency rather than recency.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 11213v1 Announce Type: new Abstract: We present Context Window Lifecycle (CWL), a context-management scheme that gives long-horizon LLM agents an effectively unbounded working horizon.

As a session accumulates history, CWL keeps the context within budget through graduated, semantically-aware eviction: the agent annotates its trajectory as typed, dependency-linked episodes as work proceeds, and a deterministic, LLM-free policy evicts content in priority order within that structure when a token budget is exceeded.

CWL preserves user turns and the exploratory context the agent is actively reasoning over, while aggressively shedding action episodes whose effects are already persisted in the environment, keeping active context near a stable ceiling that also avoids the performance degradation associated with very large prompts. Compared to summarization-based compaction, CWL avoids four well-known limitations: unpredictable lossiness, destruction of causal structure, blocking model cost, and compression-induced hallucination.

Compared to recency truncation, CWL is semantically aware: it drops the oldest-and-most-recoverable content according to the dependency graph rather than oldest-in-time regardless of relevance. We describe the annotation protocol, the episode graph, the eviction policy, and the token-accounting loop, and evaluate CWL on long-horizon agentic benchmarks: a single agent session completing 89 sequential tasks across 80 million tokens with no measurable degradation in task accuracy relative to per-task isolated sessions

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Leyao Wang, Yanan He, Peng Chen, Asaf Yehudai, Yixin Liu, Rex Ying, Michal Shmueli-Scheuer, Arman Cohan

3w ago

FeaturedOriginal

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

AI Summary

The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.

#LLM #Agent #Inference #Policy