GeoISF: Instance Semantic Forest Inspired Large-Scale Cross-View Geo-Localization via Ground LiDAR-to-Satellite Image
Quick Answer
GeoISF introduces a novel large-scale LiDAR-to-image geo-localization pipeline that significantly enhances cross-view localization accuracy, achieving 13.22 times better performance than existing methods on the KITTI dataset.
Quick Take
GeoISF introduces a novel large-scale LiDAR-to-image geo-localization pipeline that significantly enhances cross-view localization accuracy, achieving 13.22 times better performance than existing methods on the KITTI dataset. By utilizing an instance semantic forest for improved semantic representation, it effectively bridges the modality gap between point clouds and satellite images. The code will be released as an open-source resource for the research community.
Key Points
- GeoISF enhances semantic matching accuracy for large-scale geo-localization tasks.
- Achieves 13.22 times better performance than parallel LiDAR-to-image methods.
- Utilizes an instance semantic forest constructed from WordNet for improved representation.
- Addresses challenges in computational efficiency and accuracy in cross-view localization.
- Open-source code will be available for the broader research community.
Paper Resources
📖 Reader Mode
~2 min readAbstract:The problem of localization on a large-scale satellite image given a frame of query ground view point clouds remains challenging. Existing LiDAR-to-image cross-view localization methods struggle in large-scale scenarios due to limited semantic alignment and the modality gap between point clouds and satellite images. This paper introduces the large-scale LiDAR-to-image geo-localization pipeline called GeoISF. GeoISF introduces an instance semantic forest constructed using WordNet, which enhances temporal semantic representation and discriminative power by integrating semantic trees from multiple frames. By leveraging environmental semantic representation as a shared medium, GeoISF effectively bridges the modality gap and improves semantic matching accuracy. Extensive experiments demonstrate the superior performance of GeoISF in large-scale cross-view localization, 13.22 times better than the parallel LiDAR-to-image method in the R@10 metric on the KITTI dataset. The proposed method addresses the existing gap in large-scale LiDAR-to-image cross-view localization, offering a robust solution to the computational and accuracy challenges inherent in such scenarios. We will release the code as an open-source resource available online for the broader research community.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2606.28371 [cs.CV] |
| (or arXiv:2606.28371v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2606.28371 arXiv-issued DOI via DataCite |
Submission history
From: Di Hu [view email]
[v1]
Wed, 17 Jun 2026 02:17:02 UTC (1,132 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.