3D-PLOT-LLM: Part-Level Object Tokens for 3D Large Language Models | AI Deep Signal

3D-PLOT-LLM: Part-Level Object Tokens for 3D Large Language Models

arXiv cs.CV·Jintang Xue, Xinyu Wang, Yixing Wu, Jingwen Chen, C. -C. Jay Kuo

6/19/2026

·~2 min·6/19/2026·en·2

Quick Answer

This paper shows that 3D-PLOT-LLM introduces a novel approach to part-level object tokens in 3D MLLMs, achieving superior performance on benchmarks like PartVerse-QA and 3DCoMPaT-GrIn with under 1M new trainable parameters.

Quick Take

It surpasses existing models such as PointLLM and ShapeLLM, demonstrating significant improvements in part-aware tasks without the need for heavy segmentation decoders.

Key Points

3D-PLOT- reorganizes input tokens for direct part addressing in 3D MLLMs.
Achieved Jaccard 0.459 and Exact-match 13.78% on PartVerse-QA benchmark.
Outperformed PointLLM and others on 3DCoMPaT-GrIn across all metrics.
Added PartVerse-QA improved Objaverse captioning metrics by +0.65 SBERT.
Utilizes under 1M new parameters, significantly less than previous models.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

3D multimodal (3D MLLMs) describe a 3D object as a whole but cannot address, name, or reason about its parts. Prior part-aware attempts add segmentation decoders, heavier 3D encoders, or bounding-box grammars at substantial parameter cost. We take a fundamentally different path: we reorganize the input token stream so that parts become directly addressable through the LLM's own vocabulary. Our model, 3D-PLOT-LLM, partitions the frozen point encoder's patches into K locally

Read the full article on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Aavash Chhetri, Bibek Niroula, Eduard Vazquez, Yash Raj Shrestha, Prashnna Gyawali, Loris Bazzani, Binod Bhattarai

3w ago

FeaturedOriginal

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

AI Summary

ProMoE-FL introduces a Prototype-conditioned Mixture-of-Experts framework for multimodal federated learning, effectively addressing missing modalities. It outperforms existing methods on four chest X-ray datasets, demonstrating superior feature synthesis capabilities in both homogeneous and heterogeneous settings.

#LLM #AI Coding #AI Startup #Enterprise AI

3D-PLOT-LLM: Part-Level Object Tokens for 3D Large Language Models

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CV

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

-Guided ANN Index Optimization for Human-Object Interaction Retrieval

ReLoop-UME: Recurrent Depth with Learnable Retrieval Registers for Universal Multimodal Embedding

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CV

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

ReLoop-UME: Recurrent Depth with Learnable Retrieval Registers for Universal Multimodal Embedding

-Guided ANN Index Optimization for Human-Object Interaction Retrieval