A self-supervised learning approach to deep filter banks for texture recognition
Quick Take
This study introduces a convolutional autoencoder for texture recognition, leveraging deep filters and Fisher vector pooling. It significantly enhances classification accuracy while reducing computational complexity compared to state-of-the-art methods, addressing the challenge of limited training data in real-world applications.
Key Points
- Proposes a convolutional autoencoder for texture recognition.
- Utilizes deep filters and Fisher vector pooling for improved performance.
- Reduces computational burden compared to traditional vision transformers.
- Demonstrates effectiveness across various texture databases.
- Addresses data scarcity issues in real-world texture recognition tasks.
Article Content
From source RSS / original summaryarXiv:2605. 27843v1 Announce Type: new Abstract: An important challenge in texture recognition is the limited amount of data for training frequently found in real-world applications. In computer vision in general, a successful strategy to mitigate this issue is the use of a pretraining stage where the neural network learns to identify relations between parts of the data in a self-supervised manner. A well-established framework in this direction is masked autoencoder.
Nevertheless, these models usually rely on computationally intensive architectures, such as vision transformers. In the particular case of texture images, most of the relevant information is compacted within a delimited area around each pixel, which suggests that capturing long-range dependence via the attention mechanism may be unnecessary. Based on that assumption, here we propose a framework where the pretraining model is a convolutional autoencoder.
To leverage the rich information conveyed by texture patterns, we employ deep filters coupled with Fisher vector pooling. In this way, we improve the performance of texture recognition without adding significant computational burden. Our approach is compared with several state-of-the-art methods in different texture databases, confirming its potential both in terms of classification accuracy and computational complexity.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.