QueryGaussian: Scalable and Training-Free Open-Vocabulary 3D… | AI Deep Signal

QueryGaussian: Scalable and Training-Free Open-Vocabulary 3D Instance Retrieval

arXiv cs.CV·Xiuyuan Zhu, Ke Lu, Zijie Yang, Chao Yue, Jian Xue, Dongming Zhang

6/19/2026

·~2 min·6/19/2026·en·1

Quick Answer

QueryGaussian introduces a training-free framework for scalable open-vocabulary 3D instance retrieval, achieving over 70% GPU memory reduction and 180x faster inference.

Quick Take

This method leverages pre-trained 2D models for semantic interpretation, enabling efficient retrieval in city-scale environments with millions of instances.

Key Points

QueryGaussian reduces GPU memory usage by over 70% compared to existing methods.
Achieves 180x faster inference times, making it suitable for real-time applications.
Utilizes pre-trained 2D vision models for effective semantic understanding.
Decouples semantic understanding from geometric representation for improved efficiency.
Enables retrieval in city-scale scenes with tens of millions of instances.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

Efficiently retrieving specific 3D instances from large-scale scenes via natural language prompts remains a formidable challenge in multimedia analysis. Existing approaches predominantly follow a "scene-level embedding" paradigm, which requires distilling high-dimensional semantic features into every 3D primitive. This strategy suffers from a fundamental architectural bottleneck: memory and computational costs scale linearly with scene complexity, inevitably triggering out-of-memory (OOM) failur

Read the full article on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Aavash Chhetri, Bibek Niroula, Eduard Vazquez, Yash Raj Shrestha, Prashnna Gyawali, Loris Bazzani, Binod Bhattarai

3w ago

FeaturedOriginal

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

AI Summary

ProMoE-FL introduces a Prototype-conditioned Mixture-of-Experts framework for multimodal federated learning, effectively addressing missing modalities. It outperforms existing methods on four chest X-ray datasets, demonstrating superior feature synthesis capabilities in both homogeneous and heterogeneous settings.

#LLM #AI Coding #AI Startup #Enterprise AI

QueryGaussian: Scalable and Training-Free Open-Vocabulary 3D Instance Retrieval

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CV

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

-Guided ANN Index Optimization for Human-Object Interaction Retrieval

ReLoop-UME: Recurrent Depth with Learnable Retrieval Registers for Universal Multimodal Embedding

Related in this space

Synthetic Data Generation for Financial AI Research with NVIDIA NeMo

Deploy a Production-Ready NVIDIA AI-Q Blueprint on Oracle Cloud Infrastructure

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CV

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

ReLoop-UME: Recurrent Depth with Learnable Retrieval Registers for Universal Multimodal Embedding

Related in this space

Synthetic Data Generation for Financial AI Research with NVIDIA NeMo

Deploy a Production-Ready NVIDIA AI-Q Blueprint on Oracle Cloud Infrastructure

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

-Guided ANN Index Optimization for Human-Object Interaction Retrieval