Plan2Map: A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records
Quick Take
Plan2Map introduces a 208-case benchmark for reconstructing geospatial boundaries from UK planning documents. The GeoPlanAgent system achieves a mean IoU of 0.736, significantly outperforming baseline models, highlighting the challenges in localization and map registration.
Key Points
- Plan2Map benchmark focuses on document-grounded geospatial boundary reconstruction.
- GeoPlanAgent achieves 0.736 mean IoU and 0.904 median IoU.
- 67.8% of predictions reach or exceed 0.8 IoU.
- Errors are mainly in localization and map registration.
- Supervised boundary segmentation improves pixel-level mask quality.
Article Content
From source RSS / original summaryarXiv:2606. 02747v1 Announce Type: new Abstract: Planning records define restrictions over geographic areas, but their source documents often provide only indirect spatial evidence rather than machine-readable boundaries. We introduce Plan2Map, a 208-case multimodal benchmark for document-grounded geospatial boundary reconstruction from UK planning records.
Given only a source planning document, systems must reconstruct a valid geospatial boundary from notice text, schedules, map plates, map labels, and boundary annotations; the reference GeoJSON is held out for scoring. We propose GeoPlanAgent, a document-grounded, geospatial-tool-in-the-loop system that decomposes the task into evidence extraction, localisation, map registration, boundary segmentation, projection, and verification. On Plan2Map, GeoPlanAgent achieves 0. 736 mean IoU and 0. 904 median IoU, with 67.
8\% of predictions at or above 0. 8 IoU, substantially outperforming direct VLM-to-GeoJSON baselines. Diagnostic analysis shows that direct VLM prediction remains unreliable, while remaining errors are concentrated in localisation and map registration, and supervised boundary segmentation substantially improves pixel-level mask quality. Plan2Map provides a concrete testbed for multimodal geospatial reconstruction from public planning records. Project page: https://odeb1. github. io/Plan2Map_Project_Page/.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, enabling efficient fine-tuning with only 0.11% parameter updates. It significantly enhances performance in few-shot learning and domain shifts across 15 biomedical imaging datasets, demonstrating robustness for clinical applications.