Chorus II: Cross-Request Sparsity Reuse for Efficient Image-to-Video Generation

arXiv cs.CV·Hao Liu, Chenghuan Huang, Hao Liu, Xing Cai, Chen Li, Ziyang Ma, Jing Lyu, Nong Xiao, Jiangsu Du

6d ago

·~2 min·6/25/2026·en·2

Quick Answer

Quick Take

The Chorus II framework introduces cross-request sparsity reuse for image-to-video generation, achieving a 2.16× speedup by leveraging shared sparse masks from historical requests, minimizing online mask prediction overhead. This method enhances efficiency while maintaining generation quality, addressing the computational challenges of diffusion models in large-scale deployments.

Key Points

Chorus II uses shared sparse masks to enhance image-to-video generation efficiency.
Achieves a 2.16× speedup compared to traditional methods with minimal overhead.
Guidance enhancement mitigates semantic drift and improves condition adherence.
Feature reuse is optional and focuses on redundant spatiotemporal regions.
Addresses computational challenges in large-scale diffusion model deployments.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 23 Jun 2026]

View PDF HTML (experimental)

Abstract:Serving diffusion models for image-to-video generation is computationally expensive, posing significant challenges for large-scale deployment. Real I2V workloads often contain similar requests, such as repeated effect templates, related subjects, and recurring shot layouts. Existing cross-request acceleration methods mainly exploit this redundancy through feature reuse. We observe that similar I2V requests also share highly consistent sparse attention patterns, enabling historical sparse masks to serve as request-conditioned priors with almost no online mask-prediction overhead. We propose a cross-request reuse framework centered on \textbf{sparsity reuse}, with \textbf{feature reuse} as an optional extension safeguarded by a lightweight \textbf{guidance enhancement}. Our sparsity reuse is implemented as shared sparse mask reuse, which reuses high-quality sparse masks from similar historical requests to avoid per-request online mask prediction. Optional feature reuse applies downsampled computation to highly redundant spatiotemporal regions, mitigating boundary artifacts while preserving efficiency gains. Guidance enhancement reinforces image/text conditioning after reuse, mitigating semantic drift and condition-adherence issues. Experiments show that default sparsity reuse configuration preserves generation quality with a \textbf{2.16$\times$} speedup.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2606.25040 [cs.CV]
	(or arXiv:2606.25040v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2606.25040 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Hao Liu [view email]
[v1] Tue, 23 Jun 2026 18:00:55 UTC (4,228 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

3w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup