High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

arXiv cs.CV·Dongyang Liu, Ruoyi Du, David Liu, Dengyang Jiang, Liangchen Li, Qilong Wu, Zhen Li, Steven C. H. Hoi, Hongsheng Li, Peng Gao

1d ago

·~1 min·6/12/2026·en·1

Quick Answer

This paper shows that Z-Image Turbo++ introduces a high-fidelity 2-step image generation model distilled from an 8-step teacher, overcoming key challenges in task difficulty and model capacity.

Quick Take

Z-Image Turbo++ introduces a high-fidelity 2-step image generation model distilled from an 8-step teacher, overcoming key challenges in task difficulty and model capacity. The method employs Distribution-Aligned Adversarial Learning, Step-Decoupled Parameterization, and End-to-End Training with Iterative Regularization, significantly narrowing the quality gap between 2-step and 8-step generation.

Key Points

Introduces Z-Image Turbo++, a 2-step image generation model.
Utilizes teacher-generated images for improved GAN training.
Implements independent parameters for distinct denoising steps.
Achieves significant quality improvements in few-step generation.
Highlights the effectiveness of tailored distillation strategies.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 12575v1 Announce Type: new Abstract: Few-step diffusion distillation has become increasingly mature for 4-8-step generation, yet pushing further to 2 steps remains challenging. In this work, we introduce Z-Image Turbo++, a high-quality 2-step image generation model distilled from the 8-step Z-Image Turbo teacher.

Our method addresses the central bottlenecks of increased task difficulty and limited model capacity in 2-step generation through three simple but effective design choices tailored to this regime. First, we propose Distribution-Aligned Adversarial Learning, which uses teacher-generated images rather than external real images as real samples for GAN training, providing a more attainable and informative adversarial target.

Second, we adopt Step-Decoupled Parameterization, assigning independent model parameters to the two denoising steps to better match their distinct capacity demands. Third, we perform End-to-End Training with Iterative Regularization, allowing the first step to receive gradients from final image quality while preserving a meaningful intermediate generation through an explicit step-1 loss.

Together, these designs substantially narrow the quality gap between 2-step and 8-step generation in both qualitative and quantitative evaluations, highlighting the potential of carefully tailored distillation strategies for improving the quality-efficiency trade-off in few-step generation.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

1w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup