High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation
Quick Answer
This paper shows that Z-Image Turbo++ introduces a high-fidelity 2-step image generation model distilled from an 8-step teacher, overcoming key challenges in task difficulty and model capacity.
Quick Take
Z-Image Turbo++ introduces a high-fidelity 2-step image generation model distilled from an 8-step teacher, overcoming key challenges in task difficulty and model capacity. The method employs Distribution-Aligned Adversarial Learning, Step-Decoupled Parameterization, and End-to-End Training with Iterative Regularization, significantly narrowing the quality gap between 2-step and 8-step generation.
Key Points
- Introduces Z-Image Turbo++, a 2-step image generation model.
- Utilizes teacher-generated images for improved GAN training.
- Implements independent parameters for distinct denoising steps.
- Achieves significant quality improvements in few-step generation.
- Highlights the effectiveness of tailored distillation strategies.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 12575v1 Announce Type: new Abstract: Few-step diffusion distillation has become increasingly mature for 4-8-step generation, yet pushing further to 2 steps remains challenging. In this work, we introduce Z-Image Turbo++, a high-quality 2-step image generation model distilled from the 8-step Z-Image Turbo teacher.
Our method addresses the central bottlenecks of increased task difficulty and limited model capacity in 2-step generation through three simple but effective design choices tailored to this regime. First, we propose Distribution-Aligned Adversarial Learning, which uses teacher-generated images rather than external real images as real samples for GAN training, providing a more attainable and informative adversarial target.
Second, we adopt Step-Decoupled Parameterization, assigning independent model parameters to the two denoising steps to better match their distinct capacity demands. Third, we perform End-to-End Training with Iterative Regularization, allowing the first step to receive gradients from final image quality while preserving a meaningful intermediate generation through an explicit step-1 loss.
Together, these designs substantially narrow the quality gap between 2-step and 8-step generation in both qualitative and quantitative evaluations, highlighting the potential of carefully tailored distillation strategies for improving the quality-efficiency trade-off in few-step generation.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.