WHU-Infra3D: A Full-stack Multi-modal Dataset and Benchmark for 3D Roadside Infrastructure Inventory
Quick Answer
WHU-Infra3D introduces a comprehensive multi-modal dataset for roadside infrastructure, covering 53.8 km across three cities with over 175k 2D bounding boxes and 181k detailed annotations.
Quick Take
WHU-Infra3D introduces a comprehensive multi-modal dataset for roadside infrastructure, covering 53.8 km across three cities with over 175k 2D bounding boxes and 181k detailed annotations. It establishes benchmarks for five core tasks, revealing significant gaps in current models' performance on long-tailed defective statuses, thus advancing AI-driven urban infrastructure management.
Key Points
- Dataset integrates panoramic imagery and LiDAR point clouds for precise infrastructure inventory.
- Includes over 175k multi-view 2D bounding boxes and thousands of 3D instances.
- Provides 181k detailed annotations for operational health assessment.
- Establishes benchmarks for 2D detection, 3D geo-identification, and more.
- Highlights significant cross-city domain gaps in current model performance.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 09882v1 Announce Type: new Abstract: The paradigm of digital twin cities is shifting from coarse visual mapping toward more precise and actionable digitization of urban assets. However, existing datasets predominantly focus on coarse visual perception, lacking the strict multi-modal alignment and attribute and status diagnosis required for automated infrastructure maintenance.
To bridge this gap, we introduce WHU-Infra3D, a large-scale, multi-modal benchmark dataset dedicated to roadside infrastructure inventory. Covering 53. 8 km across three cities, WHU-Infra3D uniquely integrates panoramic imagery and LiDAR point clouds with rigorous 2D-3D instance association and cross-frame tracking. Comprising over 175k multi-view 2D bounding boxes alongside thousands of 3D infrastructure instances, the dataset provides over 181k detailed attribute and status annotations (e. g.
, rust, occlusion) to empower operational health assessment. We establish comprehensive baselines across five core tasks: 2D detection, 2D cross-view matching, 3D geo-identification, 3D point cloud segmentation, and attribute recognition.
Extensive evaluations expose significant cross-city domain gaps and inherent vulnerabilities of current models on long-tailed defective statuses, establishing WHU-Infra3D as an essential testbed for advancing scalable, AI-driven urban infrastructure inventory and lifecycle management. The WHU-Infra3D dataset is available at https://github. com/WHU-USI3DV/WHU-Infra3D.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.