Harnessing Self-Supervised Features for Art Classification

arXiv cs.CV·Federico Melis, Davide Bilardello, Emanuele Prato, Evelyn Turri, Lorenzo Baraldi

17h ago

·~2 min·5/20/2026·en·1

Quick Take

This paper explores self-supervised features for improved artwork classification and retrieval.

Key Points

Evaluates DINO and CLIP models for artwork classification.
Self-supervised backbones enhance classification performance.
Insights applicable to VR museum navigation applications.

📖 Reader Mode

~2 min read

[Submitted on 18 May 2026]

View PDF HTML (experimental)

Abstract:Classifying artworks presents a significant challenge due to the complex interplay of fine-grained details and abstract features that condition the style or genre of an artwork. This paper presents a systematic investigation of the effectiveness of supervised and self-supervised backbones as feature extractors for both artwork classification and retrieval, with a particular focus on paintings. We conduct an extensive experimental evaluation using the DINO family and CLIP models, assessing multiple classification strategies and feature representations. Our results demonstrate that employing a self-supervised backbone leads to consistent improvements in artwork classification performance. Moreover, our work provides insights into the applicability of classification and retrieval modules in real-world applications, such as virtual reality (VR) applications that support museum navigation.

Comments:	IRCDL 2026
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Cite as:	arXiv:2605.18974 [cs.CV]
	(or arXiv:2605.18974v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2605.18974 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Evelyn Turri [view email]
[v1] Mon, 18 May 2026 18:00:59 UTC (14,142 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Harnessing Self-Supervised Features for Art Classification

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.CV

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

Related in this space

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

From Prompts to Protocols: An AI Agent for Laboratory Automation

Agentic Trading: When LLM Agents Meet Financial Markets