GeoISF: Instance Semantic Forest Inspired Large-Scale Cross-View Geo-Localization via Ground LiDAR-to-Satellite Image

arXiv cs.CV·Di Hu, Xia Yuan, Chunxia Zhao

1d ago

·~2 min·6/30/2026·en·0

Quick Answer

Quick Take

GeoISF introduces a novel large-scale LiDAR-to-image geo-localization pipeline that significantly enhances cross-view localization accuracy, achieving 13.22 times better performance than existing methods on the KITTI dataset. By utilizing an instance semantic forest for improved semantic representation, it effectively bridges the modality gap between point clouds and satellite images. The code will be released as an open-source resource for the research community.

Key Points

GeoISF enhances semantic matching accuracy for large-scale geo-localization tasks.
Achieves 13.22 times better performance than parallel LiDAR-to-image methods.
Utilizes an instance semantic forest constructed from WordNet for improved representation.
Addresses challenges in computational efficiency and accuracy in cross-view localization.
Open-source code will be available for the broader research community.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 17 Jun 2026]

View PDF

Abstract:The problem of localization on a large-scale satellite image given a frame of query ground view point clouds remains challenging. Existing LiDAR-to-image cross-view localization methods struggle in large-scale scenarios due to limited semantic alignment and the modality gap between point clouds and satellite images. This paper introduces the large-scale LiDAR-to-image geo-localization pipeline called GeoISF. GeoISF introduces an instance semantic forest constructed using WordNet, which enhances temporal semantic representation and discriminative power by integrating semantic trees from multiple frames. By leveraging environmental semantic representation as a shared medium, GeoISF effectively bridges the modality gap and improves semantic matching accuracy. Extensive experiments demonstrate the superior performance of GeoISF in large-scale cross-view localization, 13.22 times better than the parallel LiDAR-to-image method in the R@10 metric on the KITTI dataset. The proposed method addresses the existing gap in large-scale LiDAR-to-image cross-view localization, offering a robust solution to the computational and accuracy challenges inherent in such scenarios. We will release the code as an open-source resource available online for the broader research community.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2606.28371 [cs.CV]
	(or arXiv:2606.28371v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2606.28371 arXiv-issued DOI via DataCite

Submission history

From: Di Hu [view email]
[v1] Wed, 17 Jun 2026 02:17:02 UTC (1,132 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

3w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup