NSVQ: Mitigating Codebook Collapse by Stabilizing Encoder Drift in Vector Quantization

arXiv cs.CV·Hao Lu, Yongxin Guo, Onur Koyun, Zhengjie Zhu, Abbas Alili, Metin N. Gurcan

2d ago

·~1 min·6/11/2026·en·0

Quick Answer

This paper shows that The NSVQ method mitigates codebook collapse in vector quantization by addressing encoder drift, enhancing reconstruction quality on ImageNet-1k.

Quick Take

The NSVQ method mitigates codebook collapse in vector quantization by addressing encoder drift, enhancing reconstruction quality on ImageNet-1k. It reduces rFID from 2.39 to 2.10 while maintaining full codebook utilization. This approach also improves downstream generation FID in latent diffusion tasks.

Key Points

NSVQ combines dense non-stationary embedding loss and codebook replacement strategies.
The method stabilizes encoder drift during early training and consolidates codebook geometry.
Experiments show NSVQ maintains full codebook utilization while improving performance.
On ImageNet-1k, NSVQ achieves a significant reduction in rFID compared to SimVQ.
Latent diffusion experiments indicate improved generation quality with NSVQ.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 11363v1 Announce Type: new Abstract: Vector quantization is central to modern generative modeling pipelines, but large-codebook VQ models often suffer from codebook collapse. We identify encoder drift as a key driver of this failure: as the encoder moves the latent distribution, sparsely updated code vectors can lag behind, lose assignments, and increase quantization error, creating a feedback loop through the straight-through estimator.

We propose NSVQ, a non-stationary-aware VQ training strategy that combines a dense non-stationary embedding loss, codebook replacement, and stage-wise encoder freezing. NSVQ first helps the codebook track encoder drift during early training, then freezes the encoder to consolidate the codebook under a fixed latent geometry, and finally reintroduces adversarial refinement. Experiments on ImageNet-1k show that NSVQ improves reconstruction quality while maintaining full codebook utilization.

On ImageNet-1k at 128$\times$128 with 65,536 codes, NSVQ reduces rFID from 2. 39 to 2. 10 compared with SimVQ, while both methods maintain 100\% utilization. Additional latent diffusion experiments show that NSVQ also improves downstream ImageNet generation FID.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

1w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup