MIBE: Multi-subject Interaction Benchmark and Evaluator for Personalized Image Generation

arXiv cs.CV·Zhihan Chen, Yuhuan Zhao, Yijie Zhu, Xinyu Yao, Mengcong Ren, Suwen Wang, Qiuyang Yin, Yuchen Sun, Qin Wang, Lu Xin

2h ago

·~2 min·7/3/2026·en·0

Quick Answer

This paper shows that The Multi-subject Interaction Benchmark and Evaluator (MIBE) introduces a framework for personalized image generation, addressing limitations in existing models.

Quick Take

The Multi-subject Interaction Benchmark and Evaluator (MIBE) introduces a framework for personalized image generation, addressing limitations in existing models. MIBE includes a 60K-pair Silver Set and a 4K-pair Gold Set, achieving 95.1% cross- preference agreement. The Multi-subject Interaction Evaluator (MIE) demonstrates 0.922 pairwise accuracy against human preferences, outperforming traditional metrics like CLIP and DINO.

Key Points

MIBE features a 60K-pair Silver Set for scalable metric training.
The Gold Set includes 4K pairs for double-blind human evaluation.
MIE achieves 0.922 pairwise accuracy against human preferences.
MIE outperforms baseline metrics like CLIP and DINO.
The framework enhances multi-subject interaction fidelity in image generation.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2607. 01383v1 Announce Type: new Abstract: Multi-subject personalized image generation requires the precise rendering of all requested reference identities and their specified interactions based on a guiding prompt. However, state-of-the-art models still struggle with this process, frequently omitting subjects, failing to preserve reference appearances, or misattributing interactions.

Furthermore, existing metrics designed primarily for single-subject fidelity cannot reliably capture these errors, suffering severe degradation in ranking separability and failing to align with human preference as the subject count increases. To address this gap, we introduce Multi-subject Interaction Benchmark and Evaluator (MIBE), a unified framework comprising a Multi-subject Interaction Benchmark (MIB) and a Multi-subject Interaction Evaluator (MIE).

MIB systematically covers diverse relation types and scene complexities through a decoupled data regime. This consists of a 60K-pair -labeled Silver Set for scalable metric training and a 4K-pair double-blind Human Evaluation Gold Set covering a diverse range of state-of-the-art generators, with the Silver Set reaching 95. 1% cross-VLM preference agreement.

To demonstrate the utility of this benchmark, we present MIE, a lightweight, reference-conditioned evaluator trained exclusively on the Silver Set with a dual-head ranking and diagnosis objective. MIE exhibits strong cross-generator generalization on the Gold Set, achieving 0. 922 overall pairwise accuracy against human preference, including 0. 982 on seen generators and 0. 884 on unseen generators.

By outperforming a broad spectrum of baseline metrics, including CLIP and DINO variants, MIE demonstrates that diagnostic supervision can preserve ranking separability and human alignment where traditional evaluators collapse.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

4w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup