ComMem: Complementary Memory Systems for Test-Time Adaptation of Vision-Language Models

arXiv cs.AI·Guanglong Sun, Shuang Cui, Bo Lei, Liyuan Wang, Zihan Zhai, Hongwei Yan, Hang Su, Jun Zhu, Yi Zhong

1d ago

·~2 min·6/30/2026·en·0

Quick Answer

ComMem introduces a dual-memory system for test-time adaptation in vision-language models, outperforming existing methods on 15 benchmark datasets.

Quick Take

ComMem introduces a dual-memory system for test-time adaptation in , outperforming existing methods on 15 benchmark datasets. By mimicking brain functions, it combines fast visual caching and slow textual refinement, achieving superior cross-modal consistency and adaptability under distribution shifts.

Key Points

ComMem mimics brain's hippocampus and neocortex for effective TTA in VLMs.
It features a fast-adapting memory for visual caching and a slow-integrating memory for text.
Extensive experiments show significant performance improvements over state-of-the-art methods.
Achieves better adaptability under natural distribution shifts and cross-dataset generalization.
Proposes a promising direction for enhancing practical deployment of VLMs.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 27 Jun 2026]

View PDF HTML (experimental)

Abstract:Test-time adaptation (TTA) of vision-language models (VLMs) is essential for their robust deployment in dynamic, real-world environments. However, existing TTA methods often adapt locally without accumulating knowledge over time, or operating within a single modality without exploiting VLMs' inherently multi-modal nature. Inspired by the \textbf{Com}plementary \textbf{Mem}ory systems of the biological brain, we propose \textbf{ComMem}, an innovative approach that mimics the distinct but cooperative roles of the hippocampus and neocortex to enable effective TTA for VLMs. ComMem consists of two key components: a fast-adapting detailed memory, akin to the hippocampus, that forms a dynamic visual cache from high-confidence test samples; and a slow-integrating abstract memory, akin to the neocortex, that continually refines global textual prototypes. For each test instance, ComMem jointly optimizes both memory systems to ensure cross-modal consistency. Extensive experiments on 15 benchmark datasets show that ComMem significantly outperforms state-of-the-art methods under both natural distribution shifts and cross-dataset generalization, offering a promising direction for enhancing VLMs' practical adaptability.

Comments:	A brain-inspired complementary memory framework leveraging fast visual caching and slow textual refinement for VLM test-time adaptation
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2606.28719 [cs.AI]
	(or arXiv:2606.28719v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2606.28719 arXiv-issued DOI via DataCite

Submission history

From: Guanglong Sun [view email]
[v1] Sat, 27 Jun 2026 03:55:04 UTC (712 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Binghai Wang, Chenlong Zhang, Dayiheng Liu, Jiajun Zhang, Jiawei Chen, Mouxiang Chen, Rongyao Fang, Siyuan Zhang, Xuwu Wang, Yuheng Jing, Zeyao Ma, Zeyu Cui

5d ago

FeaturedOriginal

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

AI Summary

As coding agents evolve, verifying solutions becomes more challenging than generating them, necessitating a focus on scalable, faithful, and robust verification methods. The study reveals that no fixed reward function can sustain effectiveness as model capabilities advance, emphasizing the need for verification to evolve alongside solution generation.

#Agent #AI Coding #Inference #Policy