Not All NVFP4 QAT Recipes Are Equal: How Architecture and Scale Shape Model Quality for Anomaly Segmentation

arXiv cs.CV·Zijian Du, Oleg Rybakov

5/28/2026

·~1 min·5/28/2026·en·4

Quick Answer

The study reveals that architecture significantly influences quantization robustness in anomaly segmentation, with the Swin Transformer outperforming CNNs under FP4 QAT.

Quick Take

The study reveals that architecture significantly influences quantization robustness in anomaly segmentation, with the Swin Transformer outperforming CNNs under FP4 QAT. Attention-based models maintain performance across various scales and recipes, while CNNs degrade with gradient-quantizing recipes. This finding underscores the importance of architecture choice for effective low-precision inference in medical imaging tasks.

Key Points

Swin Transformer shows resilience to QAT recipe variations across all scales.
CNNs degrade in performance under gradient-quantizing recipes at larger scales.
Advanced QAT recipes mitigate quantization noise, enhancing CNN quality.
Study confirms findings through five-fold patient-level cross-validation.
Attention-based architectures outperform CNNs in low-precision inference tasks.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2605. 27616v1 Announce Type: new Abstract: Real-time anomaly segmentation demands both high recall and efficient low-precision inference. We study the three-way interaction of model architecture, model scale, and FP4 quantization-aware training (QAT) recipe on a recall-critical brain tumor segmentation task, evaluating multiple architectures, scales, and QAT recipes under a unified protocol.

We find that architecture choice has the largest impact on quantization robustness, with attention-based architectures showing remarkable resilience to recipe choice while CNN degrades under gradient-quantizing recipes at larger scales. At low capacity, FP4 can discretize softmax attention, but advanced QAT recipes prevent this collapse. At larger scales, advanced recipes mitigate gradient quantization noise that degrades CNN quality.

Five-fold patient-level cross-validation confirms these findings are robust to data partition. Our results show that the Swin Transformer is robust to QAT recipe choice across all scales, making it the recommended architecture for FP4-quantized anomaly segmentation.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Aavash Chhetri, Bibek Niroula, Eduard Vazquez, Yash Raj Shrestha, Prashnna Gyawali, Loris Bazzani, Binod Bhattarai

5d ago

FeaturedOriginal

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

AI Summary

ProMoE-FL introduces a Prototype-conditioned Mixture-of-Experts framework for multimodal federated learning, effectively addressing missing modalities. It outperforms existing methods on four chest X-ray datasets, demonstrating superior feature synthesis capabilities in both homogeneous and heterogeneous settings.

#LLM #AI Coding #AI Startup #Enterprise AI

Not All NVFP4 QAT Recipes Are Equal: How Architecture and Scale Shape Model Quality for Anomaly Segmentation

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CV

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

Knowledge-Constrained Shape Optimization with a Mixture-of-Experts Neural Operator for High-Confidence Design

Related in this space

Synthetic Data Generation for Financial AI Research with NVIDIA NeMo

Deploy a Production-Ready NVIDIA AI-Q Blueprint on Oracle Cloud Infrastructure

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure