AEGIS: A Backup Reflex for Physical AI
Quick Answer
AEGIS introduces a selective escalation method for robot manipulation, recovering 10.1% of lost trajectories on LIBERO-Spatial compared to 4.6% with blind escalation.
Quick Take
AEGIS introduces a selective escalation method for robot manipulation, recovering 10.1% of lost trajectories on LIBERO-Spatial compared to 4.6% with blind escalation. The system activates a stronger policy only 38% of the time, optimizing performance without excessive computation.
Key Points
- AEGIS employs a lightweight probe to detect high-risk steps in robot manipulation.
- The method significantly outperforms blind escalation and random triggering in trajectory recovery.
- Early-warning probe achieves an AUROC of 0.764 over the first 30% of trajectory steps.
- The analysis was validated on 700 episodes, confirming the robustness of the results.
- Control switches to a stronger policy only when necessary, enhancing efficiency.
Article Content
From source RSS / original summaryarXiv:2606. 06660v1 Announce Type: new Abstract: Long-horizon robot manipulation tends to fail gradually: one bad step degrades the state, and the policy spirals into a basin from which it cannot recover. The failure is often visible before it happens. We introduce AEGIS (Activation-probe Early-warning, Gated Inference Switching), a selective escalation method that uses a lightweight probe on a weak policy's frozen activations to detect high-risk steps while there is still time to act.
When the probe flags a step, control switches to a stronger separate policy, but only for the steps that need it. On LIBERO-Spatial, AEGIS recovers 10. 1% of the trajectories the weak policy alone loses, versus 4. 6% for budget-matched blind escalation and 5. 1% for a random-trigger placebo. These gains are significant under one-sided exact paired McNemar tests with Holm-Bonferroni adjustment over three pre-registered contrasts: +5. 4pp over blind escalation, p=8. 5e-6; +5. 0pp over random triggering, p=1.
0e-4; paired-trajectory bootstrap CIs exclude zero. AEGIS activates the stronger policy on only 38% of steps, so the lever is timing rather than compute. The probe clears its precondition with an early-window AUROC of 0. 764, 95% CI [0. 70, 0. 84], read from the weak-policy path over the first 30% of trajectory steps before any handoff.
We pre-register the full analysis plan, including a conditional recovered-task-rate estimand and explicit kill criteria, and confirm the result on 700 common-random-number episodes per arm, with nA-fail=646.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective
This paper addresses the sim-to-real gap for foundation model agents by framing it within a Markov Decision Process (MDP) structure. It advocates for established solutions like domain randomization to enhance agent robustness, aiming to create standardized benchmarks for reliable real-world applications.
