Multimodal Object Detection Under Sparse Forest-Canopy Occlusion
Quick Take
The paper presents a multimodal pipeline for detecting humans under forest canopies using LiDAR and image fusion techniques.
Key Points
- Integrates LiDAR, visible-thermal fusion, and synthetic-aperture imaging.
- YOLOv5 achieves ~0.83 mAP on FLIR thermal dataset.
- Establishes a baseline for UAV search-and-rescue in forests.
📖 Reader Mode
~2 min readAbstract:Reliable detection of humans beneath forest canopy remains a difficult remote-sensing challenge due to sparse, structured, and viewpoint-dependent occlusion. This paper presents a multimodal proof-of-concept pipeline that integrates three complementary approaches: (i) experimental evaluation of LiDAR returns through vegetation to assess the feasibility of active sensing, (ii) visible--thermal image fusion using a multi-scale transform and sparse-representation framework to enhance human saliency, and (iii) synthetic-aperture image formation via Airborne Optical Sectioning (AOS) to suppress canopy clutter. A YOLOv5 detector is fine-tuned on the Teledyne FLIR thermal dataset and evaluated on thermal and fused imagery. Results show that the tested terrestrial LiDAR configuration provides limited penetration for object-level detection, while visible--thermal fusion improves target visibility in low-contrast scenes and AOS enhances ground-plane detection in synthetic forest imagery. The fine-tuned YOLOv5 achieves a mean average precision of $\sim$0.83 on the top three FLIR classes. These findings establish an initial baseline for UAV-deployable search-and-rescue and surveillance systems operating in forested environments, and motivate future work on dedicated forest datasets and real-time multimodal integration.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2605.15326 [cs.CV] |
| (or arXiv:2605.15326v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.15326 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Mangal Kothari [view email]
[v1]
Thu, 14 May 2026 18:39:51 UTC (7,830 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.