MAVEN: A Multi-stage Agentic Annotation Pipeline for Video Reasoning Tasks
Quick Take
MAVEN is a multi-stage pipeline for generating structured annotations for video reasoning tasks.
Key Points
- Transforms raw videos into multi-task training data.
- Supports agent-driven domain adaptation for new datasets.
- Achieves significant accuracy improvements over existing models.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.