A Dataset for Dynamic Human Preferences for Vision Language Models

arXiv cs.CV·Hannah Gao (Massachusetts Institute of Technology), Dylan Hadfield-Menell (Massachusetts Institute of Technology), Rachel Ma (Massachusetts Institute of Technology)

4h ago

·~1 min·6/9/2026·en·0

Quick Answer

This paper presents a new benchmark for evaluating Vision Language Models (VLMs) on dynamic human preferences, focusing on real-time adaptability rather than static capabilities.

Quick Take

This paper presents a new benchmark for evaluating Vision Language Models (VLMs) on dynamic human preferences, focusing on real-time adaptability rather than static capabilities. An automated pipeline generates a multi-modal dataset to assess state-of-the-art models, addressing the need for VLMs to understand context-specific user preferences during inference.

Key Points

Introduces a benchmark for dynamic human preferences in VLMs.
Focuses on real-time user preferences rather than static evaluations.
Provides an automated pipeline for generating a multi-modal dataset.
Evaluates state-of-the-art models on this novel benchmark.
Addresses the gap in existing vision-language benchmarks.

Article Excerpt

From source RSS / original summary

arXiv:2606. 07653v1 Announce Type: new Abstract: Given the increased adoption of Vision Language Models (VLMs) in human-interactive settings, it is important that we evaluate how well these models can adapt to real-time preferences for different users. While an increasing number of vision-language benchmarks have recently been introduced, they focus largely on evaluating static capabilities and generally-held preferences learned from extensive training data.

This work introduces a new benchmark for evaluating the ability of VLMs to understand dynamic human-preferences, i. e. preferences that are passed in-context at inference time. We provide an automated pipeline for generating this benchmark with variations on image dependence, a dynamic multi-modal human-preference dataset, and evaluations of state-of-the-art models on the novel benchmark.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

4d ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup