Automated Quality Assessment of Geospatial Vector Data: A GeoAI Approach using Spatial Representation Learning
Quick Answer
Topo4Vec is an automated GeoAI framework for scalable quality assessment of geospatial vector data, achieving 0.99 accuracy in detecting overlapping building footprints and 0.60 for street network errors.
Quick Take
Topo4Vec is an automated GeoAI framework for scalable quality assessment of geospatial vector data, achieving 0.99 accuracy in detecting overlapping building footprints and 0.60 for street network errors. It utilizes Spatial Representation Learning to isolate topological errors, addressing challenges in diverse urban morphologies and large data volumes. The framework demonstrates effectiveness across Los Angeles, Munich, and Singapore.
Key Points
- Topo4Vec automates quality assessment of geospatial vector data using advanced Spatial Representation Learning.
- Achieved 0.99 accuracy for detecting overlapping building footprints in diverse urban areas.
- Demonstrated 0.60 accuracy for identifying overshoots and undershoots in street networks.
- Framework addresses challenges of large data volumes and complex urban morphologies.
- Code and data are openly available for further research and application.
Paper Resources
📖 Reader Mode
~2 min readAbstract:Geospatial vector data quality is a foundational research topic in GIS, yet classic rule-based quality assessment algorithms often struggle with diverse urban morphologies and massive data volumes. Recently, Geospatial Artificial Intelligence (GeoAI) shows promising potential for automating geospatial analysis, while its application to native vector data remains largely underexplored. To fill this research gap, we proposed Topo4Vec, an automated GeoAI framework, designed for scalable vector data quality assessment via advanced Spatial Representation Learning (SRL). Specifically, Topo4Vec relax the labor-intensive manual annotation process via topological error simulation, such as overlapping polygons and street network connectivity errors e.g., overshoots and undershoots. Then, it leverages state-of-the-art SRL approaches to encode complex, native vector geometries (e.g., polylines and polygons) into a latent space where topological errors are isolated from valid ones. A systematic performance evaluation across three study areas (Los Angeles, Munich, and Singapore) demonstrates the effectiveness and robustness of Topo4Vec, achieving a peak accuracy of 0.99 for detecting overlapping building footprints and 0.60 for overshoots and undershoots in street networks. Moreover, lessons learned from Topo4Vec shed a promising light into a scalable and autonomous GeoAI approach for large-scale vector data consistency and quality monitoring within the fast-growing geospatial data ecosystems. The code and data used in the paper are made openly available in this https URL.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2606.28390 [cs.CV] |
| (or arXiv:2606.28390v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2606.28390 arXiv-issued DOI via DataCite |
Submission history
From: Hao Li [view email]
[v1]
Tue, 23 Jun 2026 09:41:59 UTC (26,653 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.