Tiny Collaborative Inference for Occlusion-Robust Object Detection
Quick Take
The study introduces a collaborative inference approach for occlusion-robust object detection on ultra-low-end devices using an MCUNet backbone and YOLOv2 head, achieving up to +0.3827 mAP improvements with decision-level fusion. The method demonstrates significant performance gains in occlusion scenarios while maintaining low communication overhead, making it suitable for IoT applications.
Key Points
- WBF outperforms feature-level fusion with +0.2736 mAP gains in asymmetric occlusion.
- Three-view fusion improves accuracy further, achieving +0.3827 mAP.
- Communication overhead is approximately 1.3 KB per exchange during inference.
- Fused output coverage increased by +29.8% over individual device performance.
- Decentralized federated learning shows limited performance under non-iid local data.
Article Content
From source RSS / original summaryarXiv:2606. 02894v1 Announce Type: new Abstract: Small edge devices such as IoT surveillance nodes and search-and-rescue (SAR) platforms are increasingly expected to run computer vision locally. On ultra-low-end hardware, however, object detection is limited by available memory and compute, by communication costs when several devices cooperate, and by the loss of accuracy caused by occlusion.
The work evaluates occlusion-robust object detection on devices with less than 1 MB SRAM by combining an MCUNet backbone, a YOLOv2 detection head, and TensorFlow Lite quantisation. We evaluate two collaborative inference strategies: feature-level fusion, which concatenates intermediate feature maps, and decision-level fusion via Weighted Boxes Fusion (WBF). Under the tested occlusion settings, WBF outperforms feature-level fusion and gives gains of up to +0. 2736 mAP in asymmetric occlusion scenarios.
Extending fusion to three views improves accuracy further (up to +0. 3827 mAP) while adding communication overhead (approximately 1. 3 KB per exchange). The hardware experiments start with a host-assisted USB-relay baseline and then move to a Wi-Fi peer-to-peer deployment on two Coral Dev Board Micro units, where WBF runs on-device and communication energy remains small relative to inference. In a representative 301.
9 s autonomous session comprising 108 frames, fused output is observed on 61 frames compared with 47 for Board 2 alone, a frame-level coverage gain of +29. 8%. We also include a small exploratory decentralised federated learning (DFL) feasibility note, but do not treat it as a main result because performance remains limited under non-iid local data.
The results support decision-level fusion as a viable option for improving occlusion robustness in small-scale edge object detection, including host-free multi-board operation on ultra-low-end hardware.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Plan2Map: A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records
Plan2Map introduces a 208-case benchmark for reconstructing geospatial boundaries from UK planning documents. The GeoPlanAgent system achieves a mean IoU of 0.736, significantly outperforming baseline models, highlighting the challenges in localization and map registration.
