Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning
Quick Answer
The LaMR framework enhances coding agents by decomposing code relevance into semantic evidence and dependency support, outperforming unpruned baselines in 12 of 16 benchmarks.
Quick Take
The LaMR framework enhances coding agents by decomposing code relevance into semantic evidence and dependency support, outperforming unpruned baselines in 12 of 16 benchmarks. It saves up to 31% in token usage on multi-turn tasks and improves Exact Match scores by up to +3.5 on single-turn tasks, demonstrating effective noise filtering.
Key Points
- LaMR uses a mixture-of-experts gating network for dynamic weighting of code relevance.
- It models code relevance with two CRFs, focusing on semantic and structural dimensions.
- LaMR achieved a 31% reduction in token usage on multi-turn tasks.
- The framework improved Exact Match scores by up to +3.5 on single-turn tasks.
- LaMR effectively denoises context, enhancing overall performance.
Paper Resources
📖 Reader Mode
~2 min readAbstract:LLM-powered coding agents spend the majority of their token budget reading repository files, yet much of the retrieved code is irrelevant to the task at hand. Existing learned pruners compress this context with a single-objective sequence labeler, collapsing all facets of code relevance into one score and one transition matrix. We show that this formulation creates a modeling bottleneck: a single CRF transition prior must serve heterogeneous retention patterns, including contiguous semantic spans and sparse structural support lines. We propose LaMR (Latent Multi-Rubric), a structured pruning framework that decomposes code relevance into two interpretable quality dimensions, semantic evidence and dependency support, each modeled by a dedicated CRF with dimension-specific transition dynamics. A mixture-of-experts gating network dynamically weights the per-rubric emissions conditioned on the query, and a final CRF layer on the fused emissions produces the aggregate keep-or-prune decision. To supervise each dimension without additional annotation cost, we derive multi-rubric labels from the existing training corpus via AST-based program analysis, simultaneously denoising the teacher's binary labels. By effectively filtering distracting noise, LaMR frequently matches or even outperforms unpruned full-context baselines. Experiments on four benchmarks (SWE-Bench Verified, SWE-QA, LCC, LongCodeQA) show that LaMR wins 12 of 16 head-to-head multi-turn comparisons. It saves up to 31% more tokens on multi-turn agent tasks and improves Exact Match by up to +3.5 on single-turn tasks, while performance is frequently enhanced by denoising the context, and any remaining drops are marginal.
| Subjects: | Artificial Intelligence (cs.AI); Computation and Language (cs.CL) |
| Cite as: | arXiv:2605.15315 [cs.AI] |
| (or arXiv:2605.15315v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.15315 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Jing Wang [view email]
[v1]
Thu, 14 May 2026 18:30:10 UTC (167 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Procedural Memory Distillation: Online Reflection for Self-Improving Language Models
Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.