GeoDrive-Bench: Benchmarking Region-Specific Multimodal Reasoning in Autonomous Driving

arXiv cs.CV·Yingzi Ma, Chaowei Xiao, Ming Jiang

4h ago

·~1 min·6/3/2026·en·0

Quick Take

GeoDrive-Bench introduces a benchmark for evaluating vision-language models (VLMs) in autonomous driving, focusing on region-specific traffic rules across six countries. With 5,053 QA pairs, it reveals significant performance variations among nine state-of-the-art VLMs, highlighting the need for improved geo-cultural reasoning in autonomous systems.

Key Points

GeoDrive-Bench features 5,053 human-validated QA pairs across six countries.
Focuses on four driving tasks: perception, prediction, planning, and region reasoning.
Nine state-of-the-art VLMs exhibit substantial performance variations in geo-driving tasks.
A distillation algorithm enhances VLMs with region-specific traffic-rule knowledge.
Results indicate current VLMs lack robust region-aware driving intelligence.

Article Content

From source RSS / original summary

arXiv:2606. 02774v1 Announce Type: new Abstract: Vision-language models (VLMs) for autonomous driving have shown promising performance, but their ability to handle region-specific traffic rules remains underexplored, raising uncertainties about their deployment across diverse global settings. We therefore introduce GeoDrive-Bench, a novel benchmark that enables the systematic investigation of VLMs' geo-culturally grounded driving reasoning.

We curated 5,053 human-validated multiple-choice QA pairs across six countries covering diverse driving cultures. Specifically, we emphasize four driving tasks: perception, prediction, planning, and region reasoning. Each question requires models to infer the correct driving behavior from visual evidence and local traffic conventions without explicit country labels.

Beyond evaluation, we further design a distillation algorithm that injects region-specific traffic-rule knowledge into the internal representations of VLMs, enabling models to better align visual scene understanding with local driving policies. Experiments on nine state-of-the-art VLMs show substantial performance variations across geo-driving cultures for each task, while our proposed baseline models exhibit improved geo-cultural reasoning across regions.

These results suggest that current VLMs still lack robust region-aware driving intelligence and highlight GeoDrive-Bench as a diagnostic and training-oriented testbed for deployable autonomous driving foundation models.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Fabian Degen, Oishi Deb, Jindong Gu, Junchi Yu, Samuele Marro, Philip Torr, Jialin Yu

4h ago

Original

Plan2Map: A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records

AI Summary

Plan2Map introduces a 208-case benchmark for reconstructing geospatial boundaries from UK planning documents. The GeoPlanAgent system achieves a mean IoU of 0.736, significantly outperforming baseline models, highlighting the challenges in localization and map registration.

#Agent #AI Coding #Inference

GeoDrive-Bench: Benchmarking Region-Specific Multimodal Reasoning in Autonomous Driving

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

Plan2Map: A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

Deep Learning-Based Automated Quantification of TIMI Myocardial Perfusion Frame Count (DL-TMPFC) from Coronary Angiography: A Novel Framework for Rapid Assessment of Microvascular Dysfunction

Related in this space

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

Aptiv to Deliver Production-Ready Edge AI with Long-Term Support with NVIDIA

TorqueAGI Announces Collaborations with NVIDIA, John Deere, and Dexterity to Advance Physical AI for Enterprise-Grade Robots