AEGIS: A Backup Reflex for Physical AI

3h ago

·~2 min·6/8/2026·en·0

Quick Answer

AEGIS introduces a selective escalation method for robot manipulation, recovering 10.1% of lost trajectories on LIBERO-Spatial compared to 4.6% with blind escalation.

Quick Take

AEGIS introduces a selective escalation method for robot manipulation, recovering 10.1% of lost trajectories on LIBERO-Spatial compared to 4.6% with blind escalation. The system activates a stronger policy only 38% of the time, optimizing performance without excessive computation.

Key Points

AEGIS employs a lightweight probe to detect high-risk steps in robot manipulation.
The method significantly outperforms blind escalation and random triggering in trajectory recovery.
Early-warning probe achieves an AUROC of 0.764 over the first 30% of trajectory steps.
The analysis was validated on 700 episodes, confirming the robustness of the results.
Control switches to a stronger policy only when necessary, enhancing efficiency.

Article Content

From source RSS / original summary

arXiv:2606. 06660v1 Announce Type: new Abstract: Long-horizon robot manipulation tends to fail gradually: one bad step degrades the state, and the policy spirals into a basin from which it cannot recover. The failure is often visible before it happens. We introduce AEGIS (Activation-probe Early-warning, Gated Inference Switching), a selective escalation method that uses a lightweight probe on a weak policy's frozen activations to detect high-risk steps while there is still time to act.

When the probe flags a step, control switches to a stronger separate policy, but only for the steps that need it. On LIBERO-Spatial, AEGIS recovers 10. 1% of the trajectories the weak policy alone loses, versus 4. 6% for budget-matched blind escalation and 5. 1% for a random-trigger placebo. These gains are significant under one-sided exact paired McNemar tests with Holm-Bonferroni adjustment over three pre-registered contrasts: +5. 4pp over blind escalation, p=8. 5e-6; +5. 0pp over random triggering, p=1.

0e-4; paired-trajectory bootstrap CIs exclude zero. AEGIS activates the stronger policy on only 38% of steps, so the lever is timing rather than compute. The probe clears its precondition with an early-window AUROC of 0. 764, 95% CI [0. 70, 0. 84], read from the weak-policy path over the first 30% of trajectory steps before any handoff.

We pre-register the full analysis plan, including a conditional recovered-task-rate estimand and explicit kill criteria, and confirm the result on 700 common-random-number episodes per arm, with nA-fail=646.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Xiaoou Liu, Tiejin Chen, Weibo Li, Xiyang Hu, Hua Wei

3h ago

FeaturedOriginal

The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective

AI Summary

This paper addresses the sim-to-real gap for foundation model agents by framing it within a Markov Decision Process (MDP) structure. It advocates for established solutions like domain randomization to enhance agent robustness, aiming to create standardized benchmarks for reliable real-world applications.

#Agent #Robotics #AI Startup #Policy

AEGIS: A Backup Reflex for Physical AI

Quick Answer

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.AI

The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective

The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification

Related in this space

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

Aptiv to Deliver Production-Ready Edge AI with Long-Term Support with NVIDIA