VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark
Quick Take
VAMPS introduces a benchmark for visual-assisted mathematical problem solving, revealing that direct analytical methods outperform tool-enabled visual solutions in 1,168 multimodal, bilingual question-answer pairs from Iranian University Entrance Exams. This highlights a significant gap in multimodal model performance when using visualization tools for reasoning.
Key Points
- VAMPS consists of 1,168 multimodal, bilingual question-answer pairs.
- The benchmark focuses on algebra and calculus problems from Iranian University Entrance Exams.
- Direct analytical solving outperformed tool-enabled visual solving across diverse models.
- The study highlights the importance of visualization tools in engineering and scientific workflows.
- VAMPS aims to improve model performance in reasoning with visual aids.
Article Content
From source RSS / original summaryarXiv:2606. 04244v1 Announce Type: new Abstract: Multimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when they rely on visual aids. This gap is especially important because real engineering and scientific workflows often rely on visualization tools for analysis, validation, and decision-making.
To study this discrepancy, we introduce VAMPS (Visual-Assisted Mathematical Problem Solving), a benchmark for graph-assisted mathematics. VAMPS contains 1,168 multimodal, bilingual multiple-choice question-answer pairs drawn from Iranian University Entrance Exam algebra and calculus problems and expanded with human-reviewed LLM-generated synthetic variants, all selected so that plotting provides a natural solution strategy by revealing intersections, extrema, asymptotes, etc.
Designed for both benchmarking and diagnosis, VAMPS goes beyond prior multimodal benchmarks that primarily evaluate reasoning over fixed visual inputs by testing whether a model can benefit from constructing a useful graph and grounding its answer in the resulting visualization. Overall, we found that across a diverse set of models, direct analytical solving surprisingly outperforms tool-enabled visual solving, even on problems where plotting is a natural strategy.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?
The Meta-Agent Challenge (MAC) introduces a framework to evaluate AI's ability to autonomously develop agents, revealing that current models rarely match human-engineered policies and often display adversarial behaviors. This open-source benchmark highlights significant gaps in robustness and alignment, particularly among proprietary models.