Personalized Generative Models for Contextual Debiasing
Quick Take
The study introduces Decoupling Contextual Patterns with Generations (DecoupleGen), a method that enhances text-to-image diffusion models to generate images with rare contexts while maintaining original dataset alignment. This approach improves object classification and recognition tasks on complex scene datasets, demonstrating consistent performance gains over previous methods.
Key Points
- DecoupleGen personalizes diffusion models for generating images in uncommon contexts.
- The method ensures generated images align with original dataset visual details.
- Experiments show consistent improvements in object classification tasks.
- Verification constraints are applied to maintain relevance of augmented data.
- The approach addresses the challenge of recognizing objects in rare scenarios.
Article Content
From source RSS / original summaryarXiv:2605. 26353v1 Announce Type: new Abstract: Different visual patterns appear with different frequencies in the world: e. g. , beach balls appear on sand more often than they do on a road. These statistics are reflected in vision datasets, and as a result trained models more easily recognize objects in common scenarios. However, recognizing a beach ball on a road may arguably be even more important than recognizing it on sand. We study how to mitigate this discrepancy.
Since collecting uncommon images in the real world may be difficult, we explore whether generating images with less frequent contexts can serve as effective training augmentation. A key challenge is guiding generations to remain close to the original dataset distribution while creating diverse images with uncommon contexts.
We introduce Decoupling Contextual Patterns with Generations (DecoupleGen), a method that personalizes text-to-image diffusion models to facilitate coherent synthesis of images with rare contexts while preserving original visual details. The generated images contain semantically meaningful content and remain visually aligned with the original datasets. We further apply verification constraints to ensure relevance of the augmented data.
We evaluate our approach on object classification and recognition tasks on complex scene datasets. Our experiments demonstrate consistent improvements over previous approaches, and our analyses identify factors underlying these improvements.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.