MemoBench: Benchmarking World Modeling in Dynamically Changing Environments

arXiv cs.CV·Haoyu Chen, Kaichen Zhou, Hang Hua, Kaile Zhang, Jingwen Qian, Wufei Ma, Haonan Chen, Chunjiang Liu, Yizhou Zhao, Xiaoyuan Wang, Weiyue Li, Alan Yuille, Paul Pu Liang, Yilun Du

2d ago

·~2 min·6/29/2026·en·0

Quick Answer

MemoBench introduces a new benchmark for evaluating memory consistency in video generation models under dynamic conditions, focusing on the disappear-and-reappear paradigm.

Quick Take

MemoBench introduces a new benchmark for evaluating memory consistency in video generation models under dynamic conditions, focusing on the disappear-and-reappear paradigm. It includes 360 ground-truth clips and assesses eight state-of-the-art models, revealing critical insights into memory challenges in changing environments.

Key Points

MemoBench evaluates memory consistency in dynamically changing environments.
The benchmark includes 360 ground-truth clips from synthetic and real-world scenes.
It assesses models based on a disappear-and-reappear paradigm.
Eight state-of-the-art models were evaluated, revealing significant memory challenges.
Combines automated metrics with VQA-based assessments across four diagnostic pillars.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 25 Jun 2026]

Authors:Haoyu Chen, Kaichen Zhou, Hang Hua, Kaile Zhang, Jingwen Qian, Wufei Ma, Haonan Chen, Chunjiang Liu, Yizhou Zhao, Xiaoyuan Wang, Weiyue Li, Alan Yuille, Paul Pu Liang, Yilun Du

View PDF HTML (experimental)

Abstract:Video generation models aspire to simulate dynamic environments, and several benchmarks now evaluate memory consistency across frames. However, most assess consistency only while the target remains in view, and the few that force objects out of view evaluate static scenes where nothing changes during occlusion. To bridge this gap, we introduce MemoBench, a diagnostic benchmark built around the disappear-and-reappear paradigm in dynamically changing environments: a target object undergoes a physical process, disappears from view, and must be correctly recovered in its updated state upon reappearance. We curate 360 ground-truth clips spanning synthetic and real-world scenes, and design an evaluation suite combining automated metrics with VQA-based assessment across four diagnostic pillars. Evaluation of eight state-of-the-art models reveals key insights and open challenges regarding memory consistency under the disappear-and-reappear paradigm.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2606.27537 [cs.CV]
	(or arXiv:2606.27537v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2606.27537 arXiv-issued DOI via DataCite

Submission history

From: Haoyu Chen [view email]
[v1] Thu, 25 Jun 2026 20:37:39 UTC (9,565 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

3w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup