DiffRGD: An Inference-Time Diffusion Guidance Through Riemannian Gradient Descent
Quick Answer
DiffRGD introduces a distribution-aware guidance framework for diffusion models, preserving latent Gaussian structures during inference.
Quick Take
DiffRGD introduces a distribution-aware guidance framework for diffusion models, preserving latent Gaussian structures during inference. It formulates sampling as a constrained optimization problem on a spherical manifold, outperforming previous methods in image restoration and conditional generation tasks. The method is plug-and-play, enhancing pre-trained models without retraining.
Key Points
- DiffRGD preserves original Gaussian distribution during inference, avoiding distributional drift.
- The method uses Riemannian Gradient Descent for efficient constrained optimization.
- Extensive experiments show superior performance in image restoration and conditional generation.
- DiffRGD can be integrated into any pre-trained diffusion model without costly retraining.
- Code is available at https://github.com/jwliao1209/DiffRGD.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 28417v1 Announce Type: new Abstract: Recently, diffusion models have been widely adopted in generative modeling and have served as foundational models for many image generation tasks. To control the generation without costly re-training or fine-tuning, many works seek inference-time guidance methods to steer the latent via a differentiable objective at inference time.
However, these methods cannot effectively preserve the original Gaussian distribution because they introduce distributional drift, thereby degrading the sample quality. To address this gap, we propose DiffRGD, a distribution-aware guidance framework that explicitly preserves the latent Gaussian structure. DiffRGD formulates each sampling step as a constrained optimization problem on a spherical manifold induced by the latent Gaussian distribution, and solves it efficiently via Riemannian Gradient Descent (RGD).
DiffRGD is a plug-and-play method that can be seamlessly integrated into any pre-trained diffusion model. Extensive experiments demonstrate that DiffRGD outperforms previous methods in most image restoration and conditional generation tasks. Our codebase is available at https://github. com/jwliao1209/DiffRGD.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.