Meta-learning as a principle for human-like visual representations
Quick Answer
This study proposes that human-like visual representations in neural networks can be enhanced through meta-learning, allowing models to adapt to new tasks with minimal data.
Quick Take
This study proposes that human-like visual representations in neural networks can be enhanced through meta-learning, allowing models to adapt to new tasks with minimal data. By training a sequence model on diverse tasks, the authors found that meta-learned representations outperform pretrained encoders in predicting human similarity judgments and learning semantic rules, highlighting the importance of flexibility in visual processing.
Key Points
- Meta-learning enables neural networks to adapt to new tasks with few observations.
- Meta-learned representations better predict human similarity judgments than pretrained models.
- The study involved training a sequence model on thousands of semantically rich tasks.
- Flexibility in visual representations reflects the need to learn new semantic relationships.
- Behavioral gains depend on high-level task distributions and learning-to-learn pressure.
Paper Resources
📖 Reader Mode
~2 min readAbstract:The structure of human visual representations underpins our capacity for adaptive behaviour. While pretrained neural networks model human visual representations with unprecedented success, a large discrepancy remains. We propose one reason: these networks optimise a single fixed objective, whereas human representations must support open-ended tasks. We hypothesise this flexibility arises from meta-learning (learning to learn), a pressure shaping representations to acquire new tasks from few observations. To test this, we train a sequence model, without any supervision from human data, across thousands of semantically rich tasks mapping images to high-level concepts. Compared to their pretrained base encoders, meta-learned representations better predict human similarity judgements, semantic rule learning, and high-level visual cortex. Behavioural gains depend on disentangled, high-level task distributions, while brain alignment is driven primarily by the learning-to-learn pressure. Our results suggest the flexibility of human visual representations reflects the functional demand to learn new semantic relationships on the fly.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC) |
| Cite as: | arXiv:2606.28399 [cs.CV] |
| (or arXiv:2606.28399v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2606.28399 arXiv-issued DOI via DataCite |
Submission history
From: Can Demircan [view email]
[v1]
Wed, 24 Jun 2026 09:28:17 UTC (12,425 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.