Making Time Editable in Video Diffusion Transformers
Quick Answer
The proposed methodology enhances pretrained Diffusion Transformers (DiT) by integrating a lightweight temporal module, enabling explicit control over motion speed and temporal structure in video generation.
Quick Take
The proposed methodology enhances pretrained Diffusion Transformers (DiT) by integrating a lightweight temporal module, enabling explicit control over motion speed and temporal structure in video generation. This approach maintains the original generative capabilities while significantly expanding the controllable dynamic range, allowing for more nuanced video editing without redesigning the backbone architecture.
Key Points
- Introduces a temporal-control methodology for video generation in Diffusion Transformers.
- Augments pretrained DiT with a lightweight temporal module for enhanced editing capabilities.
- Enables control over motion speed and temporal structure without redesigning the backbone.
- Preserves the original generative prior while expanding the dynamic range.
- Improves user control in video editing applications.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 10183v1 Announce Type: new Abstract: Modern Diffusion Transformers for video generation provide limited control over the progression of time and the editing of temporal dynamics. We propose a temporal-control methodology that extends a pretrained DiT with explicit time editing, allowing control over motion speed and temporal structure without redesigning the backbone.
Its core implementation augments the pretrained model with a lightweight temporal module, preserving the original generative prior while expanding its controllable dynamic range.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.