Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture Assembly
Quick Take
Flat-Pack Bench benchmark evaluates LVLMs on fine-grained spatio-temporal understanding in furniture assembly tasks.
Key Points
- Focuses on nuanced tasks in furniture assembly.
- Evaluates temporal ordering and localization of actions.
- Highlights limitations in LVLMs' spatio-temporal reasoning.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.