MM-Conv: A Multimodal Dataset and Benchmark for Context-Aware Grounding in 3D Dialogue
Quick Take
MM-Conv introduces a benchmark for context-aware grounding in 3D dialogue, enhancing AI's interpretative capabilities.
Key Points
- Benchmark includes 6.7 hours of VR interaction data.
- Two-stage grounding pipeline resolves conversational ambiguity.
- Contextual rewriting improves grounding performance significantly.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.