PMOF: A Dataset and Benchmark for Passenger Monitoring Using Overhead Fisheye Cameras
Quick Answer
The PMOF dataset introduces over 19,000 annotated fisheye images for passenger monitoring in moving vehicles, achieving 94.8% AP50 using YOLO26m-obb models.
Quick Take
The PMOF dataset introduces over 19,000 annotated fisheye images for passenger monitoring in moving vehicles, achieving 94.8% AP50 using YOLO26m-obb models. This benchmark highlights the performance gap between static and dynamic environments, enhancing object detection and tracking capabilities.
Key Points
- PMOF is the first public dataset for overhead fisheye imagery in moving vehicles.
- Dataset includes 19k+ annotated frames with bounding boxes, tracking IDs, and action labels.
- Cross-domain fine-tuning achieved 94.8% AP50 on PMOF and 96.5% on an unseen dataset.
- Incorporating PMOF improves detection performance for broader fisheye-based tasks.
- Dataset and code are publicly available at the provided URL.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 13910v1 Announce Type: new Abstract: Autonomous staff-free public transport requires reliable in-vehicle passenger monitoring. However, perception inside moving vehicles is challenged by confined spaces, variable illumination, motion-induced background variation, occlusion, and limited viewpoints. To mitigate these spatial constraints, ceiling-mounted fisheye cameras provide full-scene coverage from a single viewpoint.
Yet existing public overhead fisheye datasets are recorded in static environments and do not capture the domain shift introduced by vehicle motion. To fill this gap, we introduce PMOF, Passenger Monitoring using Overhead Fisheye cameras, the first public dataset of top-view fisheye imagery captured inside a moving vehicle, comprising over 19k manually annotated frames. PMOF provides rotated bounding boxes, tracking identifiers, and action labels, supporting object detection, tracking, and action recognition.
We benchmark PMOF using YOLO26m-obb models fine-tuned under multiple dataset configurations that combine PMOF with existing overhead fisheye datasets. Cross-domain fine-tuning with custom rotation-aware augmentation achieves 94. 8% AP50 on PMOF and 96. 5% AP50 on an unseen overhead fisheye dataset from a different domain.
Our results highlight the domain gap between static and moving environments and show that incorporating PMOF improves detection performance and advances generalization beyond passenger monitoring to broader fisheye-based person detection tasks. The dataset and code are available at https://swermuth. github. io/pmof/.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.