The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset
Quick Take
The KITScenes Multimodal dataset enhances autonomous driving research with high-fidelity sensors and comprehensive 3D mapping of traffic elements, addressing gaps in existing datasets. It introduces benchmarks for online HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving, promoting geographic diversity and sensor accuracy.
Key Points
- KITScenes dataset features synchronized high-resolution cameras, long-range lidar, and 4D imaging radar.
- The dataset includes the most complete HD maps validated through autonomous driving trials.
- Traffic elements like traffic lights are mapped in 3D with full topological connectivity.
- Recorded in cities with irregular layouts, enhancing geographic diversity for research.
- Introduces four benchmarks to advance spatial learning for embodied AI applications.
Article Excerpt
From source RSS / original summaryarXiv:2606. 02956v1 Announce Type: new Abstract: Existing autonomous driving datasets have enabled major progress, but fall short in sensor fidelity, map completeness, or geographic diversity. We present KITScenes Multimodal, a European dataset built around high-fidelity sensors and maps. Our fully synchronized sensor suite combines high-resolution global-shutter cameras, long-range lidar beyond 400m, 4D imaging radar, and redundant GNSS/INS localization.
Our HD maps are, to our knowledge, the most complete of any sensor dataset, validated through autonomous driving trials on open-source software. For the first time in a public dataset, all driving-relevant traffic elements, such as traffic lights, are mapped in 3D to a reprojection-accurate level with full topological connectivity. Recorded in cities with irregular street layouts and mixed traffic modes, our dataset complements existing datasets by broadening the available geographic diversity.
We also introduce four benchmarks, each advancing spatial learning for embodied AI: online HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving. Project page: https://kitscenes. com/
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Plan2Map: A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records
Plan2Map introduces a 208-case benchmark for reconstructing geospatial boundaries from UK planning documents. The GeoPlanAgent system achieves a mean IoU of 0.736, significantly outperforming baseline models, highlighting the challenges in localization and map registration.
