Hybrid Quantum-MambaVision: A Quantum-enhanced State Space Model for Calibrated Mixed-type Wafer Defect Detection
Quick Take
Hybrid Quantum-MambaVision enhances wafer defect detection using a quantum-classical model for efficient spatial data mining.
Key Points
- Integrates State-Space Model with quantum adaptation.
- Achieves low error rates on imbalanced datasets.
- Reduces Maximum Calibration Error significantly.
📖 Reader Mode
~2 min readAbstract:Extracting actionable knowledge from industrial visual data is fundamentally bottlenecked by extreme class imbalance and the prohibitive computational complexity of modern foundation models. In semi-conductor manufacturing, identifying multi-label wafer defects is a complex spatial data mining task where overlapping patterns obscure critical root-cause signals. While Vision Transformers (ViTs) excel at global dependency extraction, their quadratic scaling renders them inefficient for high-throughput, real-time anomaly detection. To overcome these computational barriers, this paper introduces Hybrid Quantum-MambaVision, a highly efficient architecture tailored for spatial knowledge discovery. We integrate a linear-complexity State-Space Model (SSM) backbone with a Parameterized Quantum Context Adapter (QCA) and Low-Rank Adaptation (LoRA). The Mamba backbone efficiently captures long-range spatial dependencies, while the quantum adapter maps compressed latent features into a high-dimensional Hilbert space to disentangle complex, overlapping signatures. On the highly imbalanced MixedWM38 dataset, Hybrid Quantum-MambaVision achieves exceptional multi-label classification performance, significantly reducing the error rate on complex multi-defect topologies compared to classical baselines. The quantum regularizer acts as a profound uncertainty calibrator, substantially reducing Maximum Calibration Error (MCE) and minimizing expected false-positive costs. This work establishes a scalable Quantum-Classical hybrid paradigm for efficient representation learning in industrial data mining.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2605.16404 [cs.CV] |
| (or arXiv:2605.16404v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.16404 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Satwik Sai Prakash Sahoo [view email]
[v1]
Wed, 13 May 2026 09:20:11 UTC (7,943 KB)
— Originally published at arxiv.org
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.

