Efficient and Training-Free Single-Image Diffusion Models
Quick Take
The proposed training-free single-image diffusion model generates high-quality images by leveraging a finite dataset of image patches, achieving state-of-the-art results in various applications, including text-guided stylization and image retargeting, with megapixel generation in one second.
Key Points
- Utilizes a closed-form denoiser for efficient image generation without neural network training.
- Achieves state-of-the-art quality and diversity compared to traditional single-image diffusion models.
- Compatible with latent space diffusion, enhancing its versatility in applications.
- Generates megapixel images in one second and gigapixel images in minutes.
- Applications include unconditional image generation and image symmetrization.
Article Content
From source RSS / original summaryarXiv:2606. 04299v1 Announce Type: new Abstract: We consider the problem of generating images whose internal structure -- defined by the distribution of patches across multiple scales -- matches that of a single reference image. Recent approaches address this problem by training a diffusion model on a single image. But even in this setting, training is computationally expensive and requires hours of optimization. Instead, we model the image using a dataset of its patches at different scales.
As this dataset is finite and the dimensionality of its patches is small, the score function for a noisy patch can be computed tractably using an optimal, closed-form denoiser, eliminating the need for neural network training. We integrate this patch-based denoiser into an efficient, training-free image diffusion model, and we describe how our method connects to classical patch-based image restoration techniques.
Our approach achieves state-of-the-art generation quality and diversity compared to trained single-image diffusion models, and we demonstrate applications, including unconditional image generation, text-guided stylization, image symmetrization, and retargeting. Further, we show that our approach is compatible with latent space diffusion, and we show multiple additional acceleration techniques to achieve megapixel single-image generation in one second, and gigapixel generation in minutes.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Optimal Transport Flow Matching by Design
The study presents a novel approach to optimal transport (OT) flow matching, reformulating the problem by treating the prior as a design choice. This method achieves over 2x reduction in trajectory curvature compared to existing methods, improving generation quality in few-step regimes without altering the flow model. The approach integrates seamlessly with latent-space models and classifier-free guidance.