AQIFormer: A Transformer-Based Multi-View Architecture for Cross-City Air Quality Classification
Quick Answer
AQIFormer is a transformer-based architecture that improves cross-city air quality classification, achieving 89.96% accuracy on a dataset of 26,678 synchronized image pairs.
Quick Take
AQIFormer is a transformer-based architecture that improves cross-city air quality classification, achieving 89.96% accuracy on a dataset of 26,678 synchronized image pairs. It integrates dual-view imagery and weather-aware attention, demonstrating exceptional generalization with only 8.29% performance degradation on an independent dataset from Nagpur, India.
Key Points
- AQIFormer achieves 89.96% accuracy, a 14.96% improvement over existing methods.
- Utilizes dual-view integration of front and rear traffic imagery.
- Employs weather-aware attention mechanisms for enhanced performance.
- Maintains 81.67% accuracy on an independent dataset with minimal training.
- Addresses scalability and economic constraints of traditional sensor-based systems.
Article Content
From source RSS / original summaryarXiv:2606. 07648v1 Announce Type: new Abstract: Air pollution represents one of the most critical environmental and public health challenges globally, with traditional sensor-based monitoring systems facing significant scalability and economic constraints. Image-based air quality estimation has emerged as a promising alternative, leveraging the visual characteristics of atmospheric pollutants in traffic scenes.
However, existing methods suffer from limited cross-city generalization and inadequate exploitation of multi-view perspectives. We present AQIFormer, a novel transformer-based ensemble architecture that addresses these fundamental limitations through innovative dual-view integration, weather-aware attention mechanisms, and comprehensive multi-task learning.
Our approach uniquely combines front and rear traffic imagery with meteorological parameters to achieve robust air quality classification across diverse urban environments. Extensive evaluation on a comprehensive dataset of 26,678 synchronized front-rear image pairs demonstrates good performance with 89. 96% accuracy, representing a 14. 96% improvement over state-of-the-art methods. Most importantly, our model maintains exceptional cross-city generalization capabilities, achieving 81.
67% accuracy on an independent dataset collected in Nagpur, India with only 8. 29% performance degradation using few-shot adaptation with minimal training samples.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.