DB-3DME: From Dataset to Benchmark for Human-aligned Automatic 3D Mesh Evaluation

arXiv cs.CV·Nanshan Jia, Zhenyu Zhao, Sui Huang, Jingshen Wang, Zeyu Zheng

3d ago

·~1 min·6/10/2026·en·0

Quick Answer

The DB-3DME benchmark introduces 2,619 synthetic 3D meshes with human ratings, enabling improved evaluation of 3D assets.

Quick Take

The DB-3DME benchmark introduces 2,619 synthetic 3D meshes with human ratings, enabling improved evaluation of 3D assets. Fine-tuning the Qwen-2.5-VL-7B model enhances performance in 3D mesh evaluation, establishing a new standard for automatic assessments.

Key Points

DB-3DME contains 2,619 synthetic 3D meshes rated on Geometry and Prompt Adherence.
Visual encoding of 3D representations is crucial for human-aligned evaluation performance.
Fine-tuning Qwen-2.5-VL-7B significantly outperforms existing pre-trained VLMs.
The benchmark dataset is publicly available on GitHub and Hugging Face.
This work addresses limitations in current 3D asset evaluation methods.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 10142v1 Announce Type: new Abstract: Recent advances in 3D generation have led to substantial improvements in realism, controllability, and efficiency, yet the evaluation of 3D assets remains underexplored. Existing evaluation paradigms, including human evaluation, learned metrics, and vision-language models (VLMs) as judges, suffer from limitations in cost, scalability, resolution handling, or task-specific alignment.

In this work, we focus on 3D mesh evaluation and introduce DB-3DME, the Dataset and Benchmark for 3D Mesh Evaluation. DB-3DME contains 2,619 synthetic 3D meshes paired with human ratings on Geometry and Prompt Adherence. Using this dataset, we systematically benchmark state-of-the-art VLMs and identify visual encoding of 3D representations as a key factor for human-aligned evaluation performance. Motivated by this finding, we fine-tune an open-weight VLM, Qwen-2.

5-VL-7B, for 3D mesh evaluation by adapting the visual encoder while freezing the language model. The fine-tuned model substantially outperforms existing pre-trained VLMs across multiple evaluation dimensions, establishing a new benchmark for automatic 3D mesh evaluation. We publicly release the benchmark dataset on GitHub and Hugging Face to facilitate future research.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

1w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup