Revealing Safety-Critical Scenarios for UTM via Transformer
Quick Answer
This study presents a transformer-based reinforcement learning approach for identifying vulnerabilities in Unmanned Traffic Management (UTM) systems, achieving an 8x improvement in discovery efficiency over expert-guided testing.
Quick Take
This study presents a transformer-based reinforcement learning approach for identifying vulnerabilities in Unmanned Traffic Management (UTM) systems, achieving an 8x improvement in discovery efficiency over expert-guided testing. The proposed framework utilizes attention mechanisms to model system states and generate targeted test scenarios, effectively uncovering critical edge cases missed by traditional methods.
Key Points
- Proposed a transformer-based RL framework for UTM vulnerability discovery.
- Achieved 8x improvement in discovery efficiency over expert-guided methods.
- Introduced a Policy Model for generating targeted test scenarios.
- Utilized a risk-based reward function to guide exploration.
- Identified critical edge cases previously missed by traditional testing.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 31114v1 Announce Type: new Abstract: Unmanned Traffic Management (UTM) systems are cloud-based platforms designed to manage and coordinate multiple aerial vehicles remotely. UTM systems are safety-critical which cannot tolerate failures like crash or collision. To reveal latent vulnerabilities, there are neither optimal failure-exposing demonstrations nor clear reward signals. Additionally, UTM's self-healing capability introduces the ``long-tail effect'' of critical failures.
We propose framing UTM vulnerability discovery as a sequence modeling problem amenable to transformer-based RL architectures. Our approach leverages attention mechanisms to directly model the relationship among system states, and predict optimal actions. Our framework introduces a Policy Model that generates targeted test scenarios and an Action Sampler that enforces domain constraints. We use a risk-based reward function to guide exploration.
Through extensive evaluation on a 700-hour simulation study, we demonstrate an 8$\times$ improvement in vulnerability discovery efficiency compared to expert-guided testing. It also discovers critical edge cases that traditional methods have missed.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Verification Horizon: No Silver Bullet for Coding Agent Rewards
As coding agents evolve, verifying solutions becomes more challenging than generating them, necessitating a focus on scalable, faithful, and robust verification methods. The study reveals that no fixed reward function can sustain effectiveness as model capabilities advance, emphasizing the need for verification to evolve alongside solution generation.