Capability Minimization as a Safety Primitive: Risk-Aware Causal Gating for Least-Privilege LLM Agents
Quick Answer
This paper shows that The Risk-Aware Causal Gating (RACG) framework enhances decision-making in LLM agents by integrating causal effect estimation with calibrated risk control, significantly reducing costly errors while maintaining utility.
Quick Take
The Risk-Aware Causal Gating (RACG) framework enhances decision-making in LLM agents by integrating causal effect estimation with calibrated risk control, significantly reducing costly errors while maintaining utility. It outperforms traditional confidence-based methods, providing a safer and more transparent approach for high-stakes automation.
Key Points
- RACG combines causal effect estimation with risk control for better decision-making.
- It reduces high-cost errors while preserving most utility of ungated policies.
- The framework adapts to distribution shifts by monitoring outcome discrepancies.
- RACG outperforms confidence-based and selective-prediction baselines.
- The approach ensures safer and more transparent automation in high-stakes settings.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 13884v1 Announce Type: new Abstract: Modern decision systems increasingly rely on learned components whose outputs may be confident yet wrong, exposing downstream actions to costly errors. We introduce Risk-Aware Causal Gating (RACG), a framework that decides whether to act on, defer, or abstain from a model's prediction by combining causal effect estimation with calibrated risk control.
RACG models the causal pathway from candidate actions to outcomes and gates each decision according to an estimated counterfactual risk rather than raw predictive confidence. To make gating reliable, we derive distribution-free bounds on the probability of acting under high-risk conditions and show how these bounds translate into operating thresholds that satisfy user-specified safety constraints.
We further propose an adaptive gating policy that adjusts to distribution shift by monitoring discrepancies between predicted and realized outcomes, tightening the gate when causal assumptions appear violated. Across simulated interventions and real-world decision benchmarks, RACG reduces high-cost errors substantially while preserving most of the utility of an ungated policy, and it outperforms confidence-based and selective-prediction baselines at matched abstention rates.
Our results indicate that explicitly separating causal risk from predictive uncertainty yields decision systems that are both safer and more transparent, offering a principled mechanism for trustworthy automation in high-stakes settings.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Arbor: Tree Search as a Cognition Layer for Autonomous Agents
Arbor introduces a multi-agent framework utilizing structured tree search for optimizing LLM inference, achieving up to 193% throughput-latency improvement compared to vendor-optimized systems. It employs an Orchestrator and Critic agent for stability and coordination, demonstrating hardware-agnostic performance with minimal variance.