DIVER:Diving Deeper into Distilled Data via Expressive Semantic Recovery
Quick Take
DIVER introduces a dual-stage distillation framework enhancing semantic recovery for improved dataset distillation.
Key Points
- Utilizes pre-trained diffusion models for deeper semantic analysis.
- Improves cross-architecture generalization with efficient processing.
- Code available on GitHub for further exploration.
📖 Reader Mode
~2 min readAbstract:Dataset distillation aims to synthesize a compact proxy dataset that is unreadable or non-raw from the original dataset for privacy protection and highly efficient learning. However, previous approaches typically adopt a single-stage distillation paradigm, which suffers from learning specific patterns that overfit on a prior architecture, consequently suppressing the expression of semantics and leading to performance degradation across heterogeneous architectures. To address this issue, we propose a novel dual-stage distillation framework called ${\textbf{DIVER}}$, which leverages the pre-trained diffusion model to dive deeper into $\textbf{DI}$stilled data $\textbf{V}$ia $\textbf{E}$xpressive semantic $\textbf{R}$ecovery, an entire process of semantic inheritance, guidance, and fusion. Semantic inheritance distills high-level semantics of abstract distilled images into the latent space to filter out architecture-specific ``noise" and retain the intrinsic semantics. Furthermore, semantic guidance improves the preservation of the original semantics by directing the reverse procedure. Finally, semantic fusion is designed to provide semantic guidance only during the concrete phase of the reverse process, preventing semantic ambiguity and artifacts while maintaining the guidance information. Extensive experiments validate the effectiveness and efficiency of DIVER in improving classical distillation techniques and significantly improving cross-architecture generalization, requiring processing time comparable to raw DiT on ImageNet (256$\times$256) with only 4 GB of GPU memory usage. Code is available: this https URL.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2605.12649 [cs.CV] |
| (or arXiv:2605.12649v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.12649 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Qianxin Xia [view email]
[v1]
Tue, 12 May 2026 18:55:53 UTC (16,215 KB)
— Originally published at arxiv.org
More from arXiv cs.CV
See more →CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers
CoReDiT enhances Diffusion Transformers by optimizing token pruning for efficiency and quality.