Robust Scene Transfer for PointGoal Navigation via Privileged Sensor Guided Contrastive Learning

arXiv cs.CV·Amirhossein Zhalehmehrabi, Tiziano Tezze, Alberto Castelini, Alessandro Farinelli

6/5/2026

·~2 min·6/5/2026·en·1

Quick Answer

This paper presents a sensor-guided adaptive contrastive learning framework for PointGoal navigation, leveraging LiDAR data to enhance visual representation learning.

Quick Take

The method significantly improves policy-level scene transfer in diverse environments, outperforming large pretrained models and standard contrastive baselines, while relying solely on monocular RGB observations during deployment.

Key Points

Introduces a geometry-aware similarity metric for contrastive learning.
Decouples representation learning from policy optimization using a frozen encoder.
Demonstrates significant improvements in scene transfer across indoor and outdoor settings.
Agent operates using only monocular RGB and standard task-related inputs.
Releases a multimodal dataset for future research in navigation representation learning.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

arXiv:2606. 05506v1 Announce Type: new Abstract: We propose a sensor-guided adaptive contrastive learning framework for visual representation learning in PointGoal navigation. During training, privileged LiDAR sensing guides the contrastive objective through a geometry-aware similarity metric and adaptive temperature scaling, encouraging visual embeddings to capture navigation-relevant structure rather than scene-specific appearance.

The resulting encoder is pretrained independently, frozen, and used as the perceptual backbone for reinforcement learning, decoupling representation learning from policy optimization. …

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Aavash Chhetri, Bibek Niroula, Eduard Vazquez, Yash Raj Shrestha, Prashnna Gyawali, Loris Bazzani, Binod Bhattarai

1w ago

FeaturedOriginal

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

AI Summary

ProMoE-FL introduces a Prototype-conditioned Mixture-of-Experts framework for multimodal federated learning, effectively addressing missing modalities. It outperforms existing methods on four chest X-ray datasets, demonstrating superior feature synthesis capabilities in both homogeneous and heterogeneous settings.

#LLM #AI Coding #AI Startup #Enterprise AI

Robust Scene Transfer for PointGoal Navigation via Privileged Sensor Guided Contrastive Learning

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CV

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

-Guided ANN Index Optimization for Human-Object Interaction Retrieval

A Synthetic 3D Gear Dataset for Manufacturing Quality Inspection (MFGNet-Gear)

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CV

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

A Synthetic 3D Gear Dataset for Manufacturing Quality Inspection (MFGNet-Gear)

-Guided ANN Index Optimization for Human-Object Interaction Retrieval