DB-3DME: From Dataset to Benchmark for Human-aligned Automatic 3D Mesh Evaluation
Quick Answer
The DB-3DME benchmark introduces 2,619 synthetic 3D meshes with human ratings, enabling improved evaluation of 3D assets.
Quick Take
The DB-3DME benchmark introduces 2,619 synthetic 3D meshes with human ratings, enabling improved evaluation of 3D assets. Fine-tuning the Qwen-2.5-VL-7B model enhances performance in 3D mesh evaluation, establishing a new standard for automatic assessments.
Key Points
- DB-3DME contains 2,619 synthetic 3D meshes rated on Geometry and Prompt Adherence.
- Visual encoding of 3D representations is crucial for human-aligned evaluation performance.
- Fine-tuning Qwen-2.5-VL-7B significantly outperforms existing pre-trained VLMs.
- The benchmark dataset is publicly available on GitHub and Hugging Face.
- This work addresses limitations in current 3D asset evaluation methods.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 10142v1 Announce Type: new Abstract: Recent advances in 3D generation have led to substantial improvements in realism, controllability, and efficiency, yet the evaluation of 3D assets remains underexplored. Existing evaluation paradigms, including human evaluation, learned metrics, and vision-language models (VLMs) as judges, suffer from limitations in cost, scalability, resolution handling, or task-specific alignment.
In this work, we focus on 3D mesh evaluation and introduce DB-3DME, the Dataset and Benchmark for 3D Mesh Evaluation. DB-3DME contains 2,619 synthetic 3D meshes paired with human ratings on Geometry and Prompt Adherence. Using this dataset, we systematically benchmark state-of-the-art VLMs and identify visual encoding of 3D representations as a key factor for human-aligned evaluation performance. Motivated by this finding, we fine-tune an open-weight VLM, Qwen-2.
5-VL-7B, for 3D mesh evaluation by adapting the visual encoder while freezing the language model. The fine-tuned model substantially outperforms existing pre-trained VLMs across multiple evaluation dimensions, establishing a new benchmark for automatic 3D mesh evaluation. We publicly release the benchmark dataset on GitHub and Hugging Face to facilitate future research.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.