A neurosymbolic Approach with Epistemic Deep Learning for Hierarchical Image Classification
Quick Take
A neurosymbolic framework enhances hierarchical image classification by integrating epistemic deep learning with fuzzy logic.
Key Points
- Combines Swin Transformers with focal set reasoning.
- Improves prediction calibration and logical consistency.
- Maintains accuracy while reducing overconfidence.
📖 Reader Mode
~2 min readAbstract:Deep neural networks achieve high accuracy on image classification tasks. Yet, they often produce overconfident predictions as which fail to express epistemic uncertainty, and frequently violate logical or structural constraints present in the data. These limitations are particularly pronounced in hierarchical classification, where predictions across fine and coarse levels must remain coherent. We propose, for the first time, a unified neurosymbolic and epistemic modelling framework that augments Swin Transformers with focal set reasoning and differentiable fuzzy logic. Rather than treating labels as isolated categories, our method induces data-driven focal sets within the learnt embedding space, which helps capture epistemic uncertainty over multiple plausible fine-grained classes. These focal sets form the basis of a belief-theoretic layer that uses fuzzy membership functions and t-norm conjunctions to encourage consistency between fine- and coarse-grained predictions. A learnable loss further balances calibration, mass regularisation, and logical consistency, allowing the model to adaptively trade off symbolic structure with data-driven evidence. In experiments on hierarchical image classification, our framework maintains accuracy on par with transformer baselines while providing more calibrated and interpretable predictions, reducing overconfidence and enforcing high logical consistency across hierarchical outputs. Our experimental results show that combining focal set reasoning with fuzzy logic provides a practical step toward deep learning models that are both accurate and epistemically aware.
| Comments: | 36 pages |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML) |
| Cite as: | arXiv:2605.16383 [cs.CV] |
| (or arXiv:2605.16383v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.16383 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Fabio Cuzzolin [view email]
[v1]
Mon, 11 May 2026 09:43:43 UTC (696 KB)
— Originally published at arxiv.org
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.