ReverseMath: Answer Inversion for Scalable and Verifiable Mathematical Problem Generation

arXiv cs.CL·Raoyuan Zhao, Yihong Liu, Yupei Du, Hinrich Sch\"utze, Michael A. Hedderich

2d ago

·~1 min·5/28/2026·en·1

Quick Take

ReverseMath introduces a scalable method for generating new mathematical problems by inverting answers, enhancing LLM evaluation and training. This approach reveals significant behavioral shifts in models, indicating memorization issues, and improves performance across benchmarks through data augmentation for reinforcement learning.

Key Points

ReverseMath masks numerical values to create new problems with known answers.
Models show behavioral shifts, sometimes failing on reversed problems.
Automatically labeled reversed problems enhance reinforcement learning training.
Inclusion of ReverseMath data improves mathematical reasoning performance.
This method provides a scalable source of verifiable training data.

Article Content

From source RSS / original summary

arXiv:2605. 27709v1 Announce Type: new Abstract: Mathematical reasoning benchmarks are vital for evaluating large language models (LLMs), but many are static and repeatedly exposed through public evaluation and training pipelines, making it difficult to separate genuine reasoning from memorization. Meanwhile, manually constructing new math problems with reliable answers remains costly. We introduce ReverseMath, a scalable method for generating new math problems through answer inversion.

Given a problem and its answer, ReverseMath masks a numerical value in the original problem, treats the original answer as a known condition, and rewrites the problem so that the masked value becomes the new answer. The generated problem reverses the original input-output relation, making its answer known by construction. We study ReverseMath for both evaluation and training.

For evaluation, paired original/reversed problems reveal substantial behavioral shifts: models sometimes fail on reversed problems and even incorrectly output the original answer, suggesting memorization-like behavior. For training, ReverseMath provides automatically labeled reversed problems as data augmentation for reinforcement learning (RL).

Experiments show that including ReverseMath-generated data improves mathematical reasoning performance across multiple benchmarks, demonstrating its value as both an analysis tool and a scalable source of verifiable training data.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

ReverseMath: Answer Inversion for Scalable and Verifiable Mathematical Problem Generation

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs

In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective