Progressive Autonomy as Preference Learning: A Formalization of Trust Calibration for Agentic Tool Use
Quick Take
The paper formalizes trust calibration in agentic tool use as a preference-learning problem.
Key Points
- Introduces a policy gateway for risk tolerance assessment.
- Utilizes Gaussian-process posterior for feedback analysis.
- Classifies actions into allow/block/ask regions.
📖 Reader Mode
~2 min readAbstract:We formalize trust calibration for agentic tool use (deciding when an automated agent's proposed action may execute autonomously versus require human approval) as a preference-learning problem. A policy gateway maintains a Gaussian-process posterior over a latent human risk-tolerance function, observed through a probit likelihood on binary approve/deny feedback, and escalates to the human exactly where the approval outcome is most uncertain. We show this is structurally an instance of Preferential Bayesian Optimization, inheriting its inference machinery (approximate Gaussian-process classification) and its sample-efficiency argument (uncertainty-targeted querying), while differing in objective: classifying an action space into allow/block/ask regions rather than optimizing a design.
| Subjects: | Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC) |
| Cite as: | arXiv:2605.19151 [cs.AI] |
| (or arXiv:2605.19151v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.19151 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Changkun Ou [view email]
[v1]
Mon, 18 May 2026 22:11:15 UTC (128 KB)
— Originally published at arxiv.org
More from arXiv cs.AI
See more →From Prompts to Protocols: An AI Agent for Laboratory Automation
An AI agent integrates large language models for automating laboratory protocols, enhancing efficiency and accuracy.