LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

arXiv cs.AI·Yuanhe Zhang, Yuekai Sun, Taiji Suzuki, Jason D. Lee, Fanghui Liu

1d ago

·~1 min·6/6/2026·en·1

Quick Answer

LeanMarathon is a multi-agent framework for reliable AI autoformalization in mathematics, successfully formalizing seven theorems from recent research papers without errors.

Quick Take

LeanMarathon is a multi-agent framework for reliable AI autoformalization in mathematics, successfully formalizing seven theorems from recent research papers without errors. It utilizes a dynamic blueprint and coordinated agents to enhance fidelity and efficiency across long mathematical developments.

Key Points

LeanMarathon formalizes seven theorems from four Erdős problems across three autonomous runs.
The framework employs a dynamic blueprint as a formal proof skeleton and natural-language proof graph.
Four contract-scoped agents are used for construction, auditing, proving, and repairing the blueprint.
The two-stage orchestrator stabilizes fidelity through adversarial review before processing proofs in parallel.
The project highlights the need for durable systems to maintain target fidelity in AI co-mathematics.

Article Content

From source RSS / original summary

arXiv:2606. 05400v1 Announce Type: new Abstract: Long-horizon autoformalization of research mathematics fails not only at hard lemmas, but at scale: statements drift, dependencies tangle, context decays, and local repairs corrupt distant work. We present LeanMarathon, a multi-agent harness for reliable research-level Lean autoformalization. Its core abstraction is an evolving blueprint: a Lean file that serves simultaneously as formal proof skeleton, natural-language proof graph, and shared system of record.

Four contract-scoped agents construct, audit, prove, and repair this blueprint. These agents are coordinated by a two-stage orchestrator that first stabilizes target fidelity through adversarial review and then discharges the proof directed acyclic graph (DAG) from its dynamic leaves upward in parallel CI-gated rounds. LeanMarathon turns one brittle multi-hour run into many local, recoverable, parallel transactions.

We evaluate LeanMarathon on two recent research papers spanning four Erd\H{o}s problems (#1051, #1196, #164, #1217). Across three autonomous runs, it formalizes all seven target theorems with no sorry, proving 258 lemmas and theorems. These results show that reliable AI co-mathematics requires not only stronger provers, but durable harnesses that preserve target fidelity across long mathematical developments. The code can be found at https://github. com/YuanheZ/LeanMarathon.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Xinyu Lu, Tianshu Wang, Pengbo Wang, zujie wen, Zhiqiang Zhang, Jun Zhou, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun

3d ago

FeaturedOriginal

The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

AI Summary

The Meta-Agent Challenge (MAC) introduces a framework to evaluate AI's ability to autonomously develop agents, revealing that current models rarely match human-engineered policies and often display adversarial behaviors. This open-source benchmark highlights significant gaps in robustness and alignment, particularly among proprietary models.

#Agent #Open Source #AI Startup #Policy