Sorries Are Not the Hard Part: An Expert-Review Case Study of a Semi-Autonomous Formalization

arXiv cs.AI·Vasily Ilin, Brian Nugent

6/15/2026

·~1 min·6/15/2026·en·1

Quick Answer

This study highlights the limitations of semi-autonomous formalization in theorem proving, using Grothendieck's vanishing theorem as a case study.

Quick Take

Despite initial success with no sorries, expert reviews revealed critical issues in definitions, generality, and API design, emphasizing the need for thorough evaluation beyond mere error counts.

Key Points

Initial version of formalization had no sorries but failed expert review.
Expert review identified issues in definitions, theorem generality, and API design.
Agents adapted well to local feedback but struggled with broader design choices.
Study argues for evaluating autoformalization beyond just closed sorries.
Refactor process led to improved formalization but still faced expert scrutiny.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

arXiv:2606. 13925v1 Announce Type: new Abstract: can often close proof gaps in interactive theorem provers, but a verified theorem is not the same thing as a reusable library contribution. We study this distinction through a detailed case study: a semi-autonomous formalization of Grothendieck's vanishing theorem. The initial version compiles with no sorries, but an expert review found serious problems in definitions, theorem generality, file organization, and the API.

We then ran a review-driven refactor and compression process and obtained a second expert review. …

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Ji Wu, Yunshan Peng, Wentao Bai, Yunke Bai, Wenzheng Shu, Jinan Pang, Yanxiang Zeng, Xialong Liu

1d ago

FeaturedOriginal

HOBA: Hierarchical On-Policy Bidding Agents for Adaptive Online Advertising

AI Summary

HOBA (Hierarchical On-policy Bidding Agents) is a novel hierarchical reinforcement learning framework that enhances online advertising bidding systems by improving adaptability and reducing hyperparameter tuning costs. It utilizes a for hyperparameter inference, a SARSA agent for expert model selection, and a dynamic expert pool for bid execution, achieving a +3.6% increase in target cost during large-scale deployment and outperforming state-of-the-art baselines on AuctionNet.

#LLM #Agent #Inference #AI Startup

Sorries Are Not the Hard Part: An Expert-Review Case Study of a Semi-Autonomous Formalization

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

HOBA: Hierarchical On-Policy Bidding Agents for Adaptive Online Advertising

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for Agents

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

HOBA: Hierarchical On-Policy Bidding Agents for Adaptive Online Advertising

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for LLM Agents

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for Agents