Meta-learning as a principle for human-like visual representations

arXiv cs.CV·Can Demircan, Marcel Binz, Alireza Modirshanechi, Eric Schulz

1d ago

·~2 min·6/30/2026·en·0

Quick Answer

This study proposes that human-like visual representations in neural networks can be enhanced through meta-learning, allowing models to adapt to new tasks with minimal data.

Quick Take

This study proposes that human-like visual representations in neural networks can be enhanced through meta-learning, allowing models to adapt to new tasks with minimal data. By training a sequence model on diverse tasks, the authors found that meta-learned representations outperform pretrained encoders in predicting human similarity judgments and learning semantic rules, highlighting the importance of flexibility in visual processing.

Key Points

Meta-learning enables neural networks to adapt to new tasks with few observations.
Meta-learned representations better predict human similarity judgments than pretrained models.
The study involved training a sequence model on thousands of semantically rich tasks.
Flexibility in visual representations reflects the need to learn new semantic relationships.
Behavioral gains depend on high-level task distributions and learning-to-learn pressure.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 24 Jun 2026]

View PDF HTML (experimental)

Abstract:The structure of human visual representations underpins our capacity for adaptive behaviour. While pretrained neural networks model human visual representations with unprecedented success, a large discrepancy remains. We propose one reason: these networks optimise a single fixed objective, whereas human representations must support open-ended tasks. We hypothesise this flexibility arises from meta-learning (learning to learn), a pressure shaping representations to acquire new tasks from few observations. To test this, we train a sequence model, without any supervision from human data, across thousands of semantically rich tasks mapping images to high-level concepts. Compared to their pretrained base encoders, meta-learned representations better predict human similarity judgements, semantic rule learning, and high-level visual cortex. Behavioural gains depend on disentangled, high-level task distributions, while brain alignment is driven primarily by the learning-to-learn pressure. Our results suggest the flexibility of human visual representations reflects the functional demand to learn new semantic relationships on the fly.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Cite as:	arXiv:2606.28399 [cs.CV]
	(or arXiv:2606.28399v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2606.28399 arXiv-issued DOI via DataCite

Submission history

From: Can Demircan [view email]
[v1] Wed, 24 Jun 2026 09:28:17 UTC (12,425 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

3w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup