Uncertainty Decomposition for Clarification Seeking in LLM Agents
Quick Answer
This paper introduces a prompt-based uncertainty decomposition for LLM agents, enhancing clarification seeking capabilities.
Quick Take
This paper introduces a prompt-based uncertainty decomposition for LLM agents, enhancing clarification seeking capabilities. The proposed method improves clarification F1 scores by 73% over ReAct+UE and 36% over UAM across five LLMs, including GPT-5.1 and GLM-4.7, on new benchmarks designed for underspecified tasks.
Key Points
- Introduces a prompt-based method for decomposing uncertainty in LLM agents.
- Enhances proactive clarification seeking in ambiguous task specifications.
- Achieves a 73% improvement in F1 scores on ALFWorld-Clarification.
- Evaluated across five LLM backbones including GPT-5.1 and GLM-4.7.
- New benchmarks include WebShop-Clarification and ALFWorld-Clarification.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 19559v1 Announce Type: new Abstract: Recent position papers argue that the classical aleatoric/epistemic uncertainty framework is insufficient for interactive large language model (LLM) agents and call for underspecification-aware, decomposed, and communicable uncertainty representations that can unlock new agent capabilities such as proactive clarification seeking and shared mental-model building.
Practical deployment constraints -- black-box APIs, interactive latency budgets, and the absence of labeled trajectories -- rule out logprob-based, multi-sampling, and training-based methods, leaving prompt-based estimation as the most viable family for surfacing such signals at deployment time. We answer this call with a simple prompt-based decomposition that separates action confidence from request uncertainty (u), enabling the agent to ask for clarification when the task specification is ambiguous.
To evaluate it, we introduce two clarification-augmented benchmarks (WebShop-Clarification and ALFWorld-Clarification) in which 50% of tasks are deliberately underspecified, and systematically compare the proposed decomposition against ReAct+UE and Uncertainty-Aware Memory (UAM) across five LLM backbones (GPT-5. 1, DeepSeek-v3. 2-exp, GLM-4. 7, Qwen3. 5-35B, GPT-OSS-120B) on these variants together with the standard WebShop, ALFWorld, and REAL benchmarks for fault detection.
Averaged across the five backbones, the proposed decomposition improves clarification F1 on ALFWorld-Clarification by 73% over ReAct+UE and by 36% over UAM, and leads clarification F1 on every backbone on WebShop-Clarification and on four of five backbones on ALFWorld-Clarification, indicating that the gains generalize beyond a single LLM.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Arbor: Tree Search as a Cognition Layer for Autonomous Agents
Arbor introduces a multi-agent framework utilizing structured tree search for optimizing LLM inference, achieving up to 193% throughput-latency improvement compared to vendor-optimized systems. It employs an Orchestrator and Critic agent for stability and coordination, demonstrating hardware-agnostic performance with minimal variance.