LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization
Quick Answer
LeanMarathon is a multi-agent framework for reliable AI autoformalization in mathematics, successfully formalizing seven theorems from recent research papers without errors.
Quick Take
LeanMarathon is a multi-agent framework for reliable AI autoformalization in mathematics, successfully formalizing seven theorems from recent research papers without errors. It utilizes a dynamic blueprint and coordinated agents to enhance fidelity and efficiency across long mathematical developments.
Key Points
- LeanMarathon formalizes seven theorems from four Erdős problems across three autonomous runs.
- The framework employs a dynamic blueprint as a formal proof skeleton and natural-language proof graph.
- Four contract-scoped agents are used for construction, auditing, proving, and repairing the blueprint.
- The two-stage orchestrator stabilizes fidelity through adversarial review before processing proofs in parallel.
- The project highlights the need for durable systems to maintain target fidelity in AI co-mathematics.
Article Content
From source RSS / original summaryarXiv:2606. 05400v1 Announce Type: new Abstract: Long-horizon autoformalization of research mathematics fails not only at hard lemmas, but at scale: statements drift, dependencies tangle, context decays, and local repairs corrupt distant work. We present LeanMarathon, a multi-agent harness for reliable research-level Lean autoformalization. Its core abstraction is an evolving blueprint: a Lean file that serves simultaneously as formal proof skeleton, natural-language proof graph, and shared system of record.
Four contract-scoped agents construct, audit, prove, and repair this blueprint. These agents are coordinated by a two-stage orchestrator that first stabilizes target fidelity through adversarial review and then discharges the proof directed acyclic graph (DAG) from its dynamic leaves upward in parallel CI-gated rounds. LeanMarathon turns one brittle multi-hour run into many local, recoverable, parallel transactions.
We evaluate LeanMarathon on two recent research papers spanning four Erd\H{o}s problems (#1051, #1196, #164, #1217). Across three autonomous runs, it formalizes all seven target theorems with no sorry, proving 258 lemmas and theorems. These results show that reliable AI co-mathematics requires not only stronger provers, but durable harnesses that preserve target fidelity across long mathematical developments. The code can be found at https://github. com/YuanheZ/LeanMarathon.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?
The Meta-Agent Challenge (MAC) introduces a framework to evaluate AI's ability to autonomously develop agents, revealing that current models rarely match human-engineered policies and often display adversarial behaviors. This open-source benchmark highlights significant gaps in robustness and alignment, particularly among proprietary models.