Overhead Wildlife Locator (OWL): Benchmarking Weakly Supervised Learning for Aerial Wildlife Surveys
Quick Answer
This paper shows that The Overhead Wildlife Locator (OWL) introduces a weakly supervised framework for aerial wildlife surveys, outperforming HerdNet with OWL-D achieving 0.934 AP.
Quick Take
The Overhead Wildlife Locator (OWL) introduces a weakly supervised framework for aerial wildlife surveys, outperforming HerdNet with OWL-D achieving 0.934 AP. OWL-C demonstrated high operational readiness, achieving F1 = 0.965 in the 2022 Central Arctic Caribou census, while reducing annotation costs significantly compared to traditional methods.
Key Points
- OWL features three variants: OWL-C, OWL-T, and OWL-D for diverse aerial scenarios.
- OWL-D sets a new benchmark with 0.934 AP on the Delplanque dataset.
- OWL-T leads with 0.978 AP on the SheepCounter UAV dataset.
- The framework reduces annotation costs by up to seven times compared to bounding-box methods.
- Code and datasets for large-scale caribou surveys are publicly available on GitHub.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 13911v1 Announce Type: new Abstract: Automated aerial wildlife surveys increasingly rely on deep learning, yet standard object detectors require bounding-box annotations, reported to be up to seven times slower and three times more expensive to produce than point-level labels.
To address this bottleneck, we introduce the Overhead Wildlife Locator (OWL), a weakly supervised density-estimation framework with three variants: OWL-C, a fully convolutional model for high-throughput screening; OWL-T, a Swin-augmented hybrid for heterogeneous, cluttered scenes; and OWL-D, built on a frozen DINOv3 ViT-H+/16 encoder with a DPT-style fusion decoder.
We benchmark all three against POLO, YOLOv11n, and YOLOv11l across five public aerial datasets, from sparse fixed-wing savanna surveys to dense UAV paddock imagery, and against the published HerdNet baseline on its native Delplanque split. OWL-D sets a new state of the art on Delplanque (0. 934 AP vs. HerdNet's 0. 840) and records the highest AP on four of the five datasets. Performance is regime-dependent: on the extreme-density SheepCounter UAV dataset the hybrid OWL-T leads (0.
978 AP) and the convolutional variants attain the lowest counting error, whereas the foundation-based OWL-D degrades, indicating which variant suits which survey type. We further validate operational readiness on the Alaska Department of Fish and Game's 2022 Central Arctic Caribou census: under cross-herd and cross-temporal transfer, OWL-C fine-tuned on the 2017 Porcupine Caribou Herd split attains F1 = 0. 965 on a held-out patch test set, with a signed count error of +3.
1% aggregated across the released test patches. We release the OWL code, model weights, and the annotated Porcupine Caribou Herd 2017 (PCH) and Central Arctic Herd 2022 (CAH) patches, the first open patch-level datasets for large-scale caribou aerial surveys, at https://github. com/microsoft/MegaDetector-Overhead.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.