Not All NVFP4 QAT Recipes Are Equal: How Architecture and Scale Shape Model Quality for Anomaly Segmentation
Quick Take
The study reveals that architecture significantly influences quantization robustness in anomaly segmentation, with the Swin Transformer outperforming CNNs under FP4 QAT. Attention-based models maintain performance across various scales and recipes, while CNNs degrade with gradient-quantizing recipes. This finding underscores the importance of architecture choice for effective low-precision inference in medical imaging tasks.
Key Points
- Swin Transformer shows resilience to QAT recipe variations across all scales.
- CNNs degrade in performance under gradient-quantizing recipes at larger scales.
- Advanced QAT recipes mitigate quantization noise, enhancing CNN quality.
- Study confirms findings through five-fold patient-level cross-validation.
- Attention-based architectures outperform CNNs in low-precision inference tasks.
Article Excerpt
From source RSS / original summaryarXiv:2605. 27616v1 Announce Type: new Abstract: Real-time anomaly segmentation demands both high recall and efficient low-precision inference. We study the three-way interaction of model architecture, model scale, and FP4 quantization-aware training (QAT) recipe on a recall-critical brain tumor segmentation task, evaluating multiple architectures, scales, and QAT recipes under a unified protocol.
We find that architecture choice has the largest impact on quantization robustness, with attention-based architectures showing remarkable resilience to recipe choice while CNN degrades under gradient-quantizing recipes at larger scales. At low capacity, FP4 can discretize softmax attention, but advanced QAT recipes prevent this collapse. At larger scales, advanced recipes mitigate gradient quantization noise that degrades CNN quality.
Five-fold patient-level cross-validation confirms these findings are robust to data partition. Our results show that the Swin Transformer is robust to QAT recipe choice across all scales, making it the recommended architecture for FP4-quantized anomaly segmentation.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.


