SceneGraphGrounder: Zero-Shot 3D Visual Grounding via Structured Scene Graph Matching
Quick Take
SceneGraphGrounder enables zero-shot 3D visual grounding through structured scene graph matching.
Key Points
- Utilizes visual marker prompting for object relationships.
- Achieves multi-view consistency in 3D grounding.
- Demonstrates robust performance on mobile robots.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.