WHU-Infra3D: A Full-stack Multi-modal Dataset and Benchmark for 3D Roadside Infrastructure Inventory

arXiv cs.CV·Chong Liu, Luxuan Fu, Xuyu Feng, Zhen Dong, Bisheng Yang

3d ago

·~1 min·6/10/2026·en·0

Quick Answer

WHU-Infra3D introduces a comprehensive multi-modal dataset for roadside infrastructure, covering 53.8 km across three cities with over 175k 2D bounding boxes and 181k detailed annotations.

Quick Take

WHU-Infra3D introduces a comprehensive multi-modal dataset for roadside infrastructure, covering 53.8 km across three cities with over 175k 2D bounding boxes and 181k detailed annotations. It establishes benchmarks for five core tasks, revealing significant gaps in current models' performance on long-tailed defective statuses, thus advancing AI-driven urban infrastructure management.

Key Points

Dataset integrates panoramic imagery and LiDAR point clouds for precise infrastructure inventory.
Includes over 175k multi-view 2D bounding boxes and thousands of 3D instances.
Provides 181k detailed annotations for operational health assessment.
Establishes benchmarks for 2D detection, 3D geo-identification, and more.
Highlights significant cross-city domain gaps in current model performance.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 09882v1 Announce Type: new Abstract: The paradigm of digital twin cities is shifting from coarse visual mapping toward more precise and actionable digitization of urban assets. However, existing datasets predominantly focus on coarse visual perception, lacking the strict multi-modal alignment and attribute and status diagnosis required for automated infrastructure maintenance.

To bridge this gap, we introduce WHU-Infra3D, a large-scale, multi-modal benchmark dataset dedicated to roadside infrastructure inventory. Covering 53. 8 km across three cities, WHU-Infra3D uniquely integrates panoramic imagery and LiDAR point clouds with rigorous 2D-3D instance association and cross-frame tracking. Comprising over 175k multi-view 2D bounding boxes alongside thousands of 3D infrastructure instances, the dataset provides over 181k detailed attribute and status annotations (e. g.

, rust, occlusion) to empower operational health assessment. We establish comprehensive baselines across five core tasks: 2D detection, 2D cross-view matching, 3D geo-identification, 3D point cloud segmentation, and attribute recognition.

Extensive evaluations expose significant cross-city domain gaps and inherent vulnerabilities of current models on long-tailed defective statuses, establishing WHU-Infra3D as an essential testbed for advancing scalable, AI-driven urban infrastructure inventory and lifecycle management. The WHU-Infra3D dataset is available at https://github. com/WHU-USI3DV/WHU-Infra3D.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

1w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup