Efficient coding along the visual hierarchy
Quick Take
Efficient coding can create human-aligned visual features from limited data, enhancing biological vision understanding.
Key Points
- Unsupervised learning captures natural image statistics.
- Features evolve from edges to shapes without labels.
- Hybrid methods improve brain alignment and learning speed.
📖 Reader Mode
~2 min readAbstract:Biological visual systems learn from limited experience, unlike deep learning models that rely on millions of training images. What learning principles make this possible? We tested whether efficient coding, the idea that neural representations capture the statistical structure of natural inputs, can build a hierarchy of human-aligned visual features from limited data. We developed an unsupervised learning procedure in which each layer of a deep network compresses its inputs onto the dominant modes of variation in natural images, using only local statistics and no labels, tasks, or backpropagation. This unsupervised procedure yields features that progress from edges and colors to textures and shapes. The features of this deep efficient coding model are readily recognized by human observers and are predictive of image-evoked fMRI responses in human visual cortex. Furthermore, a hybrid learning procedure that combines efficient coding with supervised fine-tuning yields better brain alignment in low-data settings and more rapid category learning. These findings suggest that efficient coding may shape representations across the entire visual hierarchy and help explain the data efficiency of biological vision.
| Comments: | 34 pages, 6 figures |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2605.19155 [cs.CV] |
| (or arXiv:2605.19155v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.19155 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Ananya Passi [view email]
[v1]
Mon, 18 May 2026 22:20:17 UTC (6,538 KB)
— Originally published at arxiv.org
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.