Adaptive Latent Agentic Reasoning
Quick Take
The Adaptive Latent Agentic Reasoning (ALAR) framework enhances LLM agents by using compact latent reasoning for routine decisions and explicit chain-of-thought for complex ones, achieving up to 43.6% fewer tokens in search tasks and 84.6% in tool use while maintaining or improving task accuracy.
Key Points
- ALAR reduces reasoning verbosity in LLM agents, improving efficiency.
- Achieves up to 43.6% fewer tokens in agentic search tasks.
- Reduces token generation by 84.6% in tool-use scenarios.
- Maintains comparable or better task accuracy with adaptive reasoning.
- Optimizes reasoning effort based on task complexity.
Article Content
From source RSS / original summaryarXiv:2606. 02871v1 Announce Type: new Abstract: Large reasoning models improve performance by generating extended chain-of-thought (CoT) reasoning, but this behavior becomes inefficient when applied to LLM agents. Current LLM agents often generate verbose textual reasoning at every decision step and allocate reasoning effort nearly uniformly across turns, leading to substantial inefficiency in multi-turn agentic trajectories.
We propose Adaptive Latent Agentic Reasoning (ALAR), a dual-mode framework that uses compact latent reasoning for routine turns and selectively escalates to explicit chain-of-thought when deeper deliberation is needed. ALAR learns latent reasoning by using the agent's actions as supervision anchors and is further optimized to use latent reasoning when it is sufficient for task success and reserve explicit CoT for harder decisions.
Experiments on agentic search and tool-use benchmarks show that ALAR maintains comparable or better task accuracy while substantially reducing generated tokens by up to 43. 6% in search and 84. 6% in tool use. These results demonstrate that ALAR improves the accuracy-efficiency trade-off of LLM agents by reducing unnecessary textual reasoning while preserving explicit deliberation for harder decision steps.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.