Beyond Points: Spherical Distributional Part Prototypes for Interpretable Classification
Quick Answer
This paper shows that The vMFProto framework introduces a mixture of von Mises-Fisher components for classifying images, enhancing interpretability by addressing intra-class variability.
Quick Take
The vMFProto framework introduces a mixture of von Mises-Fisher components for classifying images, enhancing interpretability by addressing intra-class variability. It achieves state-of-the-art explanation quality on benchmarks like CUB-200-2011 and Stanford Dogs while maintaining competitive accuracy through a two-stage training process.
Key Points
- vMFProto models classes as mixtures of von Mises-Fisher components on the hypersphere.
- Achieves state-of-the-art explanation quality with improved consistency, stability, and distinctiveness.
- Utilizes entropic optimal transport for structured patch-to-prototype assignments.
- Demonstrated effectiveness on CUB-200-2011, Stanford Dogs, and Stanford Cars datasets.
- Two-stage training includes prototype discovery and end-to-end refinement.
Paper Resources
📖 Reader Mode
~2 min readAbstract:Prototype-based neural networks aim to provide intrinsic interpretability by grounding predictions in a small set of part prototypes. However, modern vision backbones typically operate in normalized, directional embedding spaces where each semantic part exhibits substantial intra-class variability. As a result, point prototypes often become redundant or unstable, hurting both explanation quality and robustness. We propose vMFProto, a distributional part-prototype framework that models each class as a mixture of von Mises-Fisher components on the hypersphere. Each prototype learns its own concentration, capturing part-specific variability, and we use entropic optimal transport (OT) to obtain structured patch-to-prototype assignments. A two-stage training schedule performs OT-driven prototype discovery followed by end-to-end refinement with patch-level distillation and distribution-aware diversity regularization. Experiments on CUB-200-2011, Stanford Dogs, and Stanford Cars with frozen DINO backbones show that vMFProto achieves state-of-the-art explanation quality (consistency, stability, and distinctiveness) with competitive accuracy. Qualitative results confirm that vMFProto yields localized, non-redundant part evidence.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2606.27582 [cs.CV] |
| (or arXiv:2606.27582v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2606.27582 arXiv-issued DOI via DataCite |
Submission history
From: Carlos Santiago [view email]
[v1]
Thu, 25 Jun 2026 22:16:41 UTC (68,271 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.