Do VLMs See What Sensors Feel? A Scalable Expert-Guided Design for Wheelchair Accessibility Assessment from Street View
Quick Answer
This study explores the use of vision-language models (VLMs) to assess wheelchair accessibility from Google Street View images.
Quick Take
This study explores the use of vision-language models (VLMs) to assess wheelchair accessibility from Google Street View images. An expert-guided framework was developed, linking 407 locations at the University of Florida with mobility behavior data, showing VLM ratings correlate with dwell time, indicating potential for scalable accessibility assessments.
Key Points
- VLMs can identify accessibility barriers using Google Street View imagery.
- The study collected a dataset linking 407 locations with wheelchair behavior data.
- VLM ratings showed a negative correlation with dwell time, indicating mobility friction.
- Certain environmental features like curb ramps were linked to higher accessibility scores.
- Findings suggest VLMs can enhance scalable assessments of wheelchair navigation.
Article Content
From source RSS / original summaryarXiv:2606. 07642v1 Announce Type: new Abstract: Assessing built-environment interaction, such as wheelchair accessibility, is difficult because real-world mobility is shaped by distributed, context-dependent, and temporary barriers that are hard to capture at scale. To support scalable assessment, this paper examines whether vision-language models (VLMs) can identify accessibility barriers from Google Street View (GSV) imagery.
We propose an expert-guided retrieval-augmented framework that combines GSV images, ADA-informed guidance, and expert-derived rubrics to evaluate accessibility dimensions. We collect a campus-scale dataset at the University of Florida, linking 407 unique GSV locations with GPS-derived wheelchair dwell behavior as a mobility-friction signal.
Results show that VLM ratings are both negatively correlated and distributionally similar with dwell time, indicating partial but consistent alignment with a behavioral proxy for mobility friction. Visual cue analysis shows that certain environmental objects, such as curb ramps and crosswalks, are associated with higher VLM accessibility scores, while alignment remains limited for subtle surface conditions, transient obstructions, and viewpoint-dependent barriers.
Overall, our findings show the potential of expert-guided VLMs for scalable accessibility assessment aligning with sensor-derived indicators of real-world wheelchair navigation.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.
