Tuning-free Instruction-based Video Editing Via Structural Noise Initialization and Guidance

arXiv cs.CV·Song Wu, Xinyu Chen, Qian Wang, Liang Li, Zili Yi, Junlan Feng

4d ago

·~2 min·5/18/2026·en·0

Quick Take

A tuning-free, instruction-based video editing framework improves visual quality using structural noise initialization and guidance.

Key Points

Introduces Structural Noise Initialization Strategy for better editing.
Employs Noise Guidance Mechanism to enhance denoising.
Achieves state-of-the-art performance in video editing.

📖 Reader Mode

~2 min read

[Submitted on 15 May 2026]

View PDF HTML (experimental)

Abstract:Video editing poses a significant challenge. While a series of tuning-free methods circumvent the need for extensive data collection and model training, they often underutilize the rich information embedded within noisy latent, leading to unsatisfactory results. To address this, we propose a \textit{tuning-free, instruction-based} video editing framework. We approach video editing from the perspective of noisy latent: we design a Structural Noise Initialization Strategy (SNIS) to secure a superior editing starting point by assigning higher noise levels to edited regions (to facilitate content change) and lower noise levels to unedited regions (to maintain content consistency). We introduce a Noise Guidance Mechanism (NGM), which leverages the video prior in the generative model and effectively integrates rich information within the noisy latent to guide the denoising process, thereby preserving unedited content and overall visual coherence. Experiments show that our proposed method achieves better visual quality and state-of-the-art performance.

Comments:	Accepted by ICIP 2026
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2605.15533 [cs.CV]
	(or arXiv:2605.15533v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2605.15533 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Song Wu [view email]
[v1] Fri, 15 May 2026 02:09:06 UTC (866 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

Tuning-free Instruction-based Video Editing Via Structural Noise Initialization and Guidance

Quick Take

Key Points

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CV

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

Related in this space

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

From Prompts to Protocols: An AI Agent for Laboratory Automation

Agentic Trading: When LLM Agents Meet Financial Markets