Multimodal Object Detection Under Sparse Forest-Canopy Occlusion

arXiv cs.CV·Nitik Jain, Mangal Kothari

4d ago

·~2 min·5/18/2026·en·2

Quick Take

The paper presents a multimodal pipeline for detecting humans under forest canopies using LiDAR and image fusion techniques.

Key Points

Integrates LiDAR, visible-thermal fusion, and synthetic-aperture imaging.
YOLOv5 achieves ~0.83 mAP on FLIR thermal dataset.
Establishes a baseline for UAV search-and-rescue in forests.

📖 Reader Mode

~2 min read

[Submitted on 14 May 2026]

View PDF HTML (experimental)

Abstract:Reliable detection of humans beneath forest canopy remains a difficult remote-sensing challenge due to sparse, structured, and viewpoint-dependent occlusion. This paper presents a multimodal proof-of-concept pipeline that integrates three complementary approaches: (i) experimental evaluation of LiDAR returns through vegetation to assess the feasibility of active sensing, (ii) visible--thermal image fusion using a multi-scale transform and sparse-representation framework to enhance human saliency, and (iii) synthetic-aperture image formation via Airborne Optical Sectioning (AOS) to suppress canopy clutter. A YOLOv5 detector is fine-tuned on the Teledyne FLIR thermal dataset and evaluated on thermal and fused imagery. Results show that the tested terrestrial LiDAR configuration provides limited penetration for object-level detection, while visible--thermal fusion improves target visibility in low-contrast scenes and AOS enhances ground-plane detection in synthetic forest imagery. The fine-tuned YOLOv5 achieves a mean average precision of $\sim$0.83 on the top three FLIR classes. These findings establish an initial baseline for UAV-deployable search-and-rescue and surveillance systems operating in forested environments, and motivate future work on dedicated forest datasets and real-time multimodal integration.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2605.15326 [cs.CV]
	(or arXiv:2605.15326v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2605.15326 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Mangal Kothari [view email]
[v1] Thu, 14 May 2026 18:39:51 UTC (7,830 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

Multimodal Object Detection Under Sparse Forest-Canopy Occlusion

Quick Take

Key Points

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CV

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

Related in this space

AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions

Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines