Stable and Near-Reversible Diffusion ODE Solvers for Image Editing
Quick Take
The paper proposes near-reversible ODE solvers for stable image editing, addressing limitations of existing methods.
Key Points
- Reversible ODE solvers improve diffusion model inversion.
- Trade-off exists between reversibility and output quality.
- Near-reversible methods enhance stability and edit fidelity.
📖 Reader Mode
~2 min readAbstract:The inversion of diffusion models plays a central role in image editing. Algebraically reversible ODE solvers provide an appealing approach to diffusion inversion for text-guided image editing, by eliminating the inversion error inherent in DDIM-based editing pipelines. However, empirical results indicate that reversibility alone is insufficient. As edits require larger semantic or visual changes, reversible diffusion solvers often exhibit instabilities and suffer sharp drops in output quality. In this paper, we show that the trade-off between exact reversibility and numerical stability manifests empirically as a trade-off between background preservation and prompt alignment in image editing. We then investigate the use of near-reversible Runge-Kutta methods as a more stable alternative to exactly reversible diffusion schemes. When combined with a vector-field smoothing strategy, the resulting approach improves edit fidelity, remains stable under large edits, and largely retains the background-preservation benefits of reversible solvers.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.16399 [cs.CV] |
| (or arXiv:2605.16399v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.16399 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Barbora Barancikova [view email]
[v1]
Tue, 12 May 2026 18:34:14 UTC (31,140 KB)
— Originally published at arxiv.org
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.