Efficient and Training-Free Single-Image Diffusion Models

arXiv cs.CV·Haojun Qiu, Kiriakos N. Kutulakos, David B. Lindell

3h ago

·~1 min·6/4/2026·en·0

Quick Take

The proposed training-free single-image diffusion model generates high-quality images by leveraging a finite dataset of image patches, achieving state-of-the-art results in various applications, including text-guided stylization and image retargeting, with megapixel generation in one second.

Key Points

Utilizes a closed-form denoiser for efficient image generation without neural network training.
Achieves state-of-the-art quality and diversity compared to traditional single-image diffusion models.
Compatible with latent space diffusion, enhancing its versatility in applications.
Generates megapixel images in one second and gigapixel images in minutes.
Applications include unconditional image generation and image symmetrization.

Article Content

From source RSS / original summary

arXiv:2606. 04299v1 Announce Type: new Abstract: We consider the problem of generating images whose internal structure -- defined by the distribution of patches across multiple scales -- matches that of a single reference image. Recent approaches address this problem by training a diffusion model on a single image. But even in this setting, training is computationally expensive and requires hours of optimization. Instead, we model the image using a dataset of its patches at different scales.

As this dataset is finite and the dimensionality of its patches is small, the score function for a noisy patch can be computed tractably using an optimal, closed-form denoiser, eliminating the need for neural network training. We integrate this patch-based denoiser into an efficient, training-free image diffusion model, and we describe how our method connects to classical patch-based image restoration techniques.

Our approach achieves state-of-the-art generation quality and diversity compared to trained single-image diffusion models, and we demonstrate applications, including unconditional image generation, text-guided stylization, image symmetrization, and retargeting. Further, we show that our approach is compatible with latent space diffusion, and we show multiple additional acceleration techniques to achieve megapixel single-image generation in one second, and gigapixel generation in minutes.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shimon Malnick, Matan Rusanovsky, Ohad Fried, Shai Avidan

3h ago

Original

Optimal Transport Flow Matching by Design

AI Summary

The study presents a novel approach to optimal transport (OT) flow matching, reformulating the problem by treating the prior as a design choice. This method achieves over 2x reduction in trajectory curvature compared to existing methods, improving generation quality in few-step regimes without altering the flow model. The approach integrates seamlessly with latent-space models and classifier-free guidance.

#AI Coding #Inference #Open Source