RAMS: Resource-Adaptive and Detection-Conditioned Model Switching for Embedded Edge Perception
Quick Answer
RAMS introduces a lightweight runtime controller for dynamic model switching among YOLOv8 tiers on embedded devices, achieving up to 5.6x faster inference with 74% accuracy retention.
Quick Take
RAMS introduces a lightweight runtime controller for dynamic model switching among YOLOv8 tiers on embedded devices, achieving up to 5.6x faster inference with 74% accuracy retention. Under heavy load, detection-conditioned switching improves accuracy scores significantly, demonstrating effective resource adaptation for edge perception tasks.
Key Points
- RAMS dynamically selects among YOLOv8 NANO, SMALL, and MEDIUM tiers without model-reload latency.
- The safety2 policy on Jetson Orin achieves 3.41 ms mean latency, 5.6x faster than fixed-MEDIUM.
- Detection-conditioned switching improves SWAS by 25.4% under oracle scoring and 47.3% under detector-derived scoring.
- Live KITTI evaluation shows VRU recall rates of 24.2%, 41.2%, and 59.0% across tiers.
- RAMS operates effectively across various platforms, including Raspberry Pi 5 and x86 laptops.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 14716v1 Announce Type: new Abstract: Edge object detection on embedded hardware requires balancing inference latency and detection quality under changing resource pressure. We present RAMS, a lightweight runtime controller that monitors device pressure, calibrates switching thresholds from idle behavior, and dynamically selects among three resident YOLOv8 tiers (NANO/SMALL/MEDIUM at 320/416/640 px) without model-reload latency.
RAMS defines five switching policies, including two detection-conditioned variants that prevent aggressive downgrades after recent vulnerable-road-user (VRU) detections. We further introduce the VRU-Weighted Accuracy Score (SWAS), a scalar metric for offline policy comparison without ground-truth annotations, together with an oracle-bounded variant that separates detector circularity from genuine tier-retention benefit.
Across Raspberry Pi 5, x86 laptops, and Jetson Orin ONNX/TensorRT deployments, the same controller equations operate over a 37x latency range. On Jetson Orin TensorRT under heavy load, the safety2 policy achieves 3. 41 ms mean latency, 5. 6x faster than fixed-MEDIUM inference, while retaining 74% of its proxy accuracy through near-NANO operation with selective SMALL and MEDIUM locks during VRU-positive windows. Detection-conditioned switching improves SWAS by 25. 4% under oracle scoring and 47.
3% under detector-derived scoring relative to threshold-only policies under heavy load. Live KITTI evaluation reports per-tier VRU recall of 24. 2%, 41. 2%, and 59. 0%, showing that reactive overrides are fundamentally limited by baseline detector recall.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.