Personalized Generative Models for Contextual Debiasing

arXiv cs.CV·Xinran Liang, Esin Tureci, Prachi Sinha, Ye Zhu, Vikram V. Ramaswamy, Olga Russakovsky

3d ago

·~1 min·5/27/2026·en·0

Quick Take

The study introduces Decoupling Contextual Patterns with Generations (DecoupleGen), a method that enhances text-to-image diffusion models to generate images with rare contexts while maintaining original dataset alignment. This approach improves object classification and recognition tasks on complex scene datasets, demonstrating consistent performance gains over previous methods.

Key Points

DecoupleGen personalizes diffusion models for generating images in uncommon contexts.
The method ensures generated images align with original dataset visual details.
Experiments show consistent improvements in object classification tasks.
Verification constraints are applied to maintain relevance of augmented data.
The approach addresses the challenge of recognizing objects in rare scenarios.

Article Content

From source RSS / original summary

arXiv:2605. 26353v1 Announce Type: new Abstract: Different visual patterns appear with different frequencies in the world: e. g. , beach balls appear on sand more often than they do on a road. These statistics are reflected in vision datasets, and as a result trained models more easily recognize objects in common scenarios. However, recognizing a beach ball on a road may arguably be even more important than recognizing it on sand. We study how to mitigate this discrepancy.

Since collecting uncommon images in the real world may be difficult, we explore whether generating images with less frequent contexts can serve as effective training augmentation. A key challenge is guiding generations to remain close to the original dataset distribution while creating diverse images with uncommon contexts.

We introduce Decoupling Contextual Patterns with Generations (DecoupleGen), a method that personalizes text-to-image diffusion models to facilitate coherent synthesis of images with rare contexts while preserving original visual details. The generated images contain semantically meaningful content and remain visually aligned with the original datasets. We further apply verification constraints to ensure relevance of the augmented data.

We evaluate our approach on object classification and recognition tasks on complex scene datasets. Our experiments demonstrate consistent improvements over previous approaches, and our analyses identify factors underlying these improvements.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Taha Koleilat, Hassan Rivaz, Yiming Xiao

3d ago

FeaturedOriginal

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

AI Summary

Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.

#AI Coding #Inference #Open Source