Learning to See Like Humans: Gaze-Aligned Cycling Safety Prediction
Quick Take
The Eye-Tracking-Guided Perceived Cycling Safety framework (EG-PCS) integrates gaze data into a pairwise learning pipeline using vision transformers, enhancing predictive accuracy and interpretability in urban cycling safety assessments. Experiments show that gaze-guided models match state-of-the-art performance while better reflecting human visual attention patterns.
Key Points
- EG-PCS uses eye-tracking data to enhance safety perception in cycling.
- The model aligns attention maps with human fixation patterns.
- Gaze-guided models achieve state-of-the-art ranking performance.
- Incorporating eye-tracking improves interpretability in urban analytics.
- Perceived safety is a key barrier to cycling adoption in cities.
Article Content
From source RSS / original summaryarXiv:2605. 24040v1 Announce Type: new Abstract: Cycling delivers significant public-health and environmental benefits, yet its uptake in cities is often limited by perceived safety. When street environments appear unsafe, individuals are less likely to cycle, making perception a key barrier to adoption. Recent work has shown that pairwise comparisons of street-view images provide a scalable way to learn subjective safety judgments.
However, existing approaches do not explicitly model human visual attention, which plays a central role in how humans perceive safety. We propose an Eye-Tracking-Guided Perceived Cycling Safety framework (EG-PCS) that integrates gaze data into a pairwise learning pipeline based on vision transformers. By supervising the model's attention mechanism with eye-tracking signals, we encourage alignment between learned attention maps and human fixation patterns.
Experiments show that gaze-guided models achieve similar ranking performance compared to state-of-the-art approaches while producing attention maps that more accurately reflect human visual attention behavior. Our results demonstrate that incorporating eye-tracking information enhances both predictive accuracy and interpretability in perception-based urban analytics.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.
