Tiny Collaborative Inference for Occlusion-Robust Object Detection

arXiv cs.CV·Chieh-Tung Cheng, Mustafa Aslanov, Eiman Kanjo

4h ago

·~2 min·6/3/2026·en·0

Quick Take

The study introduces a collaborative inference approach for occlusion-robust object detection on ultra-low-end devices using an MCUNet backbone and YOLOv2 head, achieving up to +0.3827 mAP improvements with decision-level fusion. The method demonstrates significant performance gains in occlusion scenarios while maintaining low communication overhead, making it suitable for IoT applications.

Key Points

WBF outperforms feature-level fusion with +0.2736 mAP gains in asymmetric occlusion.
Three-view fusion improves accuracy further, achieving +0.3827 mAP.
Communication overhead is approximately 1.3 KB per exchange during inference.
Fused output coverage increased by +29.8% over individual device performance.
Decentralized federated learning shows limited performance under non-iid local data.

Article Content

From source RSS / original summary

arXiv:2606. 02894v1 Announce Type: new Abstract: Small edge devices such as IoT surveillance nodes and search-and-rescue (SAR) platforms are increasingly expected to run computer vision locally. On ultra-low-end hardware, however, object detection is limited by available memory and compute, by communication costs when several devices cooperate, and by the loss of accuracy caused by occlusion.

The work evaluates occlusion-robust object detection on devices with less than 1 MB SRAM by combining an MCUNet backbone, a YOLOv2 detection head, and TensorFlow Lite quantisation. We evaluate two collaborative inference strategies: feature-level fusion, which concatenates intermediate feature maps, and decision-level fusion via Weighted Boxes Fusion (WBF). Under the tested occlusion settings, WBF outperforms feature-level fusion and gives gains of up to +0. 2736 mAP in asymmetric occlusion scenarios.

Extending fusion to three views improves accuracy further (up to +0. 3827 mAP) while adding communication overhead (approximately 1. 3 KB per exchange). The hardware experiments start with a host-assisted USB-relay baseline and then move to a Wi-Fi peer-to-peer deployment on two Coral Dev Board Micro units, where WBF runs on-device and communication energy remains small relative to inference. In a representative 301.

9 s autonomous session comprising 108 frames, fused output is observed on 61 frames compared with 47 for Board 2 alone, a frame-level coverage gain of +29. 8%. We also include a small exploratory decentralised federated learning (DFL) feasibility note, but do not treat it as a main result because performance remains limited under non-iid local data.

The results support decision-level fusion as a viable option for improving occlusion robustness in small-scale edge object detection, including host-free multi-board operation on ultra-low-end hardware.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Fabian Degen, Oishi Deb, Jindong Gu, Junchi Yu, Samuele Marro, Philip Torr, Jialin Yu

4h ago

Original

Plan2Map: A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records

AI Summary

Plan2Map introduces a 208-case benchmark for reconstructing geospatial boundaries from UK planning documents. The GeoPlanAgent system achieves a mean IoU of 0.736, significantly outperforming baseline models, highlighting the challenges in localization and map registration.

#Agent #AI Coding #Inference

Tiny Collaborative Inference for Occlusion-Robust Object Detection

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

Plan2Map: A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

Deep Learning-Based Automated Quantification of TIMI Myocardial Perfusion Frame Count (DL-TMPFC) from Coronary Angiography: A Novel Framework for Rapid Assessment of Microvascular Dysfunction

Related in this space

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

Aptiv to Deliver Production-Ready Edge AI with Long-Term Support with NVIDIA

TorqueAGI Announces Collaborations with NVIDIA, John Deere, and Dexterity to Advance Physical AI for Enterprise-Grade Robots