GeoDrive-Bench: Benchmarking Region-Specific Multimodal Reasoning in Autonomous Driving
Quick Take
GeoDrive-Bench introduces a benchmark for evaluating vision-language models (VLMs) in autonomous driving, focusing on region-specific traffic rules across six countries. With 5,053 QA pairs, it reveals significant performance variations among nine state-of-the-art VLMs, highlighting the need for improved geo-cultural reasoning in autonomous systems.
Key Points
- GeoDrive-Bench features 5,053 human-validated QA pairs across six countries.
- Focuses on four driving tasks: perception, prediction, planning, and region reasoning.
- Nine state-of-the-art VLMs exhibit substantial performance variations in geo-driving tasks.
- A distillation algorithm enhances VLMs with region-specific traffic-rule knowledge.
- Results indicate current VLMs lack robust region-aware driving intelligence.
Article Content
From source RSS / original summaryarXiv:2606. 02774v1 Announce Type: new Abstract: Vision-language models (VLMs) for autonomous driving have shown promising performance, but their ability to handle region-specific traffic rules remains underexplored, raising uncertainties about their deployment across diverse global settings. We therefore introduce GeoDrive-Bench, a novel benchmark that enables the systematic investigation of VLMs' geo-culturally grounded driving reasoning.
We curated 5,053 human-validated multiple-choice QA pairs across six countries covering diverse driving cultures. Specifically, we emphasize four driving tasks: perception, prediction, planning, and region reasoning. Each question requires models to infer the correct driving behavior from visual evidence and local traffic conventions without explicit country labels.
Beyond evaluation, we further design a distillation algorithm that injects region-specific traffic-rule knowledge into the internal representations of VLMs, enabling models to better align visual scene understanding with local driving policies. Experiments on nine state-of-the-art VLMs show substantial performance variations across geo-driving cultures for each task, while our proposed baseline models exhibit improved geo-cultural reasoning across regions.
These results suggest that current VLMs still lack robust region-aware driving intelligence and highlight GeoDrive-Bench as a diagnostic and training-oriented testbed for deployable autonomous driving foundation models.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Plan2Map: A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records
Plan2Map introduces a 208-case benchmark for reconstructing geospatial boundaries from UK planning documents. The GeoPlanAgent system achieves a mean IoU of 0.736, significantly outperforming baseline models, highlighting the challenges in localization and map registration.
