DTG-Restore: Training-Free Diffusion Refinement for Generative Video Super-Resolution

arXiv cs.CV·Hidir Yesiltepe, Koutilya PNVR, Gaurav Pathak, Navaneeth Bodla, Bharat Singh, Pinar Yanardag, Jinrong Xie

5h ago

·~1 min·6/1/2026·en·0

Quick Take

DTG-Restore introduces a training-free framework for video super-resolution, enhancing low-resolution videos by decoupling conditional and unconditional signals. The method significantly improves structural fidelity and temporal stability, validated on the GenWarp480 benchmark of 4,400 distorted videos, without requiring model retraining.

Key Points

DTG-Restore utilizes Decoupled Time Guidance for enhanced video restoration.
The method operates without retraining, transitioning from structure correction to detail refinement.
GenWarp480 benchmark includes 4,400 distorted 480p videos for robust evaluation.
Significant improvements in structural fidelity and temporal stability demonstrated.
Compatible with off-the-shelf restoration modules in a plug-and-play manner.

Article Content

From source RSS / original summary

arXiv:2605. 30431v1 Announce Type: new Abstract: Recent progress in video diffusion models has enabled remarkable generative fidelity, yet leveraging these priors for restoration remains limited by the strong coupling between conditional and unconditional branches in standard classifier-free guidance. We introduce a training-free framework that enhances distorted and low-resolution videos by decoupling these signals in time.

Our proposed Decoupled Time Guidance (DTG) evaluates the unconditional branch at a cleaner diffusion timestep, providing a lookahead prior that preserves geometry while suppressing replication of warped content. This temporal bias is annealed throughout sampling, allowing the model to transition from structure correction to detail refinement without retraining.

Combined with any off-the-shelf restoration module in a plug-and-play manner, our approach improves perceptual coherence and restores plausible structure in AIgenerated and real-world videos alike. To facilitate evaluation, we curate GenWarp480, a benchmark of 4,400 distorted 480p videos synthesized from diverse text-to-video models.

GenWarp480 focuses on characteristic generative degradations such as warped faces, body misalignments, and spatial artifacts, providing a purpose-built testbed for assessing robustness to generative errors. Extensive experiments demonstrate that our method achieves significant improvements in structural fidelity and temporal stability without any model training.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Taha Koleilat, Hassan Rivaz, Yiming Xiao

5d ago

FeaturedOriginal

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

AI Summary

Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, enabling efficient fine-tuning with only 0.11% parameter updates. It significantly enhances performance in few-shot learning and domain shifts across 15 biomedical imaging datasets, demonstrating robustness for clinical applications.

#AI Coding #Inference #Open Source