Can We Predict The Human Preference For Text-to-Image Content Prior To Generation And Is It Even Useful To Do So?
Quick Answer
This study explores predicting Human Preference Metrics (HPM) before generating images with Diffusion Models (DM), revealing that it is feasible to enhance image quality with minimal hardware overhead.
Quick Take
This study explores predicting Human Preference Metrics (HPM) before generating images with Diffusion Models (DM), revealing that it is feasible to enhance image quality with minimal hardware overhead. The research indicates that initial random noise significantly impacts output quality, particularly in smaller models.
Key Points
- Diffusion Models (DM) enable high-quality, photorealistic image synthesis from text prompts.
- Initial random noise significantly affects the quality of generated outputs.
- Predicting HPM scores can optimize resource allocation for image generation.
- Negligible hardware overhead is required for effective HPM prediction.
- The study identifies which HPMs are most effective for improving image quality.
Article Content
From source RSS / original summaryarXiv:2606. 05478v1 Announce Type: new Abstract: Diffusion Models (DM) have revolutionized text-driven generation by enabling the synthesis of high-quality, photorealistic visual content from user prompts. Whereas prior advances in visual generation such as VAEs and GANs were primarily evaluated on perceptual or visual similarity metrics such as FID PSNR, DM advances have fostered the development of more advanced Human Preference Metrics (HPM) that model and quantify human judgment as scalar values.
However, DMs synthesize content using an inherently stochastic process where random noise seeds generation. The initial random noise directly affects the quality of generated outputs, both qualitatively and quantitatively. This influence is pronounced in smaller models for local deployment scenarios. Given this phenomenon, we first investigate to what extent we can predict scalar HPM scores prior to committing compute resources for generation.
Further, we then investigate to what extent we can leverage such prediction to improve the quality of generated images, and also study which HPMs are best suited for this task. Our investigation reveals that not only is this possible, but that it is feasible to achieve negligible hardware overhead.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.