Exploring Agentic Tool-Calling Decisions via Uncertainty-Aligned Reinforcement Learning
Quick Answer
The TRUST framework enhances decision-making in LLM-based agents by integrating uncertainty quantification into reward design, improving tool-use outcomes across benchmarks.
Quick Take
The TRUST framework enhances decision-making in LLM-based agents by integrating uncertainty quantification into reward design, improving outcomes across benchmarks. Experimental results indicate a consistent increase in decision quality and agent performance while providing reliable uncertainty estimates during optimization.
Key Points
- TRUST incorporates uncertainty quantification to improve decision-making in LLM agents.
- Existing methods often lead to overconfident mistakes in tool-use decisions.
- Experimental results show enhanced decision quality across diverse benchmarks.
- The framework maintains reliable uncertainty estimates during optimization.
- Lightweight key-turn annotations are used for unified post-training.
Article Excerpt
From source RSS / original summaryarXiv:2606. 06976v1 Announce Type: new Abstract: Large language model (LLM)-based agents often make suboptimal decisions, including unsupported tool invocation and hallucinated direct responses, which may accumulate errors throughout multi-step interactions. Existing approaches mainly improve these behaviors through inference-time correction or coarse-grained reward signals based on decision outcomes and structured checklists, leaving the uncertainty characteristics of agent decisions underexplored.
We observe that decision-oriented reinforcement learning tends to weaken the uncertainty separation between correct and incorrect actions, resulting in overconfident mistakes and weaker exploration signals. Therefore, we propose TRUST, which incorporates uncertainty quantification into reward design as a repulsive force for maintaining uncertainty separation, and labels lightweight key-turn annotations for unified post-training of multi-turn trajectories.
Experimental results across diverse tool-use benchmarks show that TRUST consistently enhances both decision quality and agent performance while maintaining more reliable uncertainty estimates during optimization.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective
This paper addresses the sim-to-real gap for foundation model agents by framing it within a Markov Decision Process (MDP) structure. It advocates for established solutions like domain randomization to enhance agent robustness, aiming to create standardized benchmarks for reliable real-world applications.