AQIFormer: A Transformer-Based Multi-View Architecture for Cross-City Air Quality Classification

arXiv cs.CV·Om Kathalkar, Nitin Nilesh, Sachin Chaudhari, Anoop Namboodiri

4h ago

·~1 min·6/9/2026·en·0

Quick Answer

AQIFormer is a transformer-based architecture that improves cross-city air quality classification, achieving 89.96% accuracy on a dataset of 26,678 synchronized image pairs.

Quick Take

AQIFormer is a transformer-based architecture that improves cross-city air quality classification, achieving 89.96% accuracy on a dataset of 26,678 synchronized image pairs. It integrates dual-view imagery and weather-aware attention, demonstrating exceptional generalization with only 8.29% performance degradation on an independent dataset from Nagpur, India.

Key Points

AQIFormer achieves 89.96% accuracy, a 14.96% improvement over existing methods.
Utilizes dual-view integration of front and rear traffic imagery.
Employs weather-aware attention mechanisms for enhanced performance.
Maintains 81.67% accuracy on an independent dataset with minimal training.
Addresses scalability and economic constraints of traditional sensor-based systems.

Article Content

From source RSS / original summary

arXiv:2606. 07648v1 Announce Type: new Abstract: Air pollution represents one of the most critical environmental and public health challenges globally, with traditional sensor-based monitoring systems facing significant scalability and economic constraints. Image-based air quality estimation has emerged as a promising alternative, leveraging the visual characteristics of atmospheric pollutants in traffic scenes.

However, existing methods suffer from limited cross-city generalization and inadequate exploitation of multi-view perspectives. We present AQIFormer, a novel transformer-based ensemble architecture that addresses these fundamental limitations through innovative dual-view integration, weather-aware attention mechanisms, and comprehensive multi-task learning.

Our approach uniquely combines front and rear traffic imagery with meteorological parameters to achieve robust air quality classification across diverse urban environments. Extensive evaluation on a comprehensive dataset of 26,678 synchronized front-rear image pairs demonstrates good performance with 89. 96% accuracy, representing a 14. 96% improvement over state-of-the-art methods. Most importantly, our model maintains exceptional cross-city generalization capabilities, achieving 81.

67% accuracy on an independent dataset collected in Nagpur, India with only 8. 29% performance degradation using few-shot adaptation with minimal training samples.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

4d ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup