Representation-Conditioned Diffusion Models for Guided Training Data Generation

arXiv cs.CV·Nithesh Chandher Karthikeyan, Jonas Unger, Gabriel Eilertsen

3h ago

·~1 min·5/28/2026·en·0

Quick Take

Representation-conditioned diffusion models enhance synthetic data generation, outperforming real datasets in visual learning tasks.

Key Points

Outperforms class-conditioned generation by +10.76 p.p. on ImageNet100.
Synthetic datasets exceed real data classifier performance by +2.0 p.p.
Improves augmentation and sample filtering for training efficiency.

Article Content

From source RSS / original summary

arXiv:2605. 27495v1 Announce Type: new Abstract: Data availability remains a critical bottleneck in many deep learning applications. Large-scale datasets are often expensive to collect, curate and annotate, which can limit the scalability and applicability of supervised learning methods. In this work, we evaluate the classification performance of models trained on synthetic image datasets produced by generative deep learning.

In particular, we use latent diffusion models conditioned on learned representations from DINOv2, DINOv3, and CLIP. Our results demonstrates that this representation-conditioned formulation significantly outperforms class-conditioned generation by a large margin (+10. 76 p. p. top-1 accuracy on ImageNet100), by improving sample quality and mode coverage. Furthermore, by scaling the size of the synthetic dataset, we are able to outperform a classifier trained on the real data (+2. 0 p. p top-1 accuracy).

We also demonstrate how generated images can be used for augmentation purposes, outperforming classical augmentation methods, and how the conditioning space can be used for sample filtering to further improve training value. Collectively, these findings highlight that representation-conditioned diffusion models provide a promising approach for augmenting, complementing, or potentially replacing real-world datasets in large-scale visual learning tasks.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

Representation-Conditioned Diffusion Models for Guided Training Data Generation

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

Deep Learning-Based Automated Quantification of TIMI Myocardial Perfusion Frame Count (DL-TMPFC) from Coronary Angiography: A Novel Framework for Rapid Assessment of Microvascular Dysfunction

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning