Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4

arXiv cs.CL·Deyan Ginev, Brian Caruso, Bruce Miller, Jeff Sank, Jacob Weiskoff

1d ago

·~2 min·5/19/2026·en·3

Quick Take

arXiv enhances HTML Papers with improved fidelity, MathML 4, and Rust porting for efficiency.

Key Points

Community-driven HTML fidelity improvements resolving 50% of user reports.
Corpus-scale conversion aims for 90% error-free HTML.
Initial MathML 4 annotations for accessible speech output.

📖 Reader Mode

~2 min read

[Submitted on 15 May 2026]

View PDF HTML (experimental)

Abstract:We report on the ongoing development of arXiv's HTML Papers offering, available on every new TeX/LaTeX submission since its initial release in 2023.
The main highlights from 2025 and early 2026 are:
(i) community-driven improvements to HTML fidelity and service health, with roughly half of 6,000 user reports resolved;
(ii) corpus-scale conversion work aimed at 90% error-free HTML (currently 75%);
(iii) initial MathML 4 Intent annotations for accessible speech output;
(iv) an in-progress Rust port of LaTeXML, reducing compute costs and enabling faster previews on submission.
The arXiv HTML Papers project remains experimental, but is gradually maturing as we better understand the needs of arXiv's readers and the technical opportunities presented by new standards and by advances in programming languages and AI.

Comments:	6 pages, ICMS 2026
Subjects:	Computation and Language (cs.CL); Digital Libraries (cs.DL)
MSC classes:	68U15 (Primary) 68V25, 68U35 (Secondary)
ACM classes:	I.7.2; H.3.7
Cite as:	arXiv:2605.16562 [cs.CL]
	(or arXiv:2605.16562v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.16562 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Deyan Ginev [view email]
[v1] Fri, 15 May 2026 19:04:45 UTC (25 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent

Related in this space

From Prompts to Protocols: An AI Agent for Laboratory Automation

Agentic Trading: When LLM Agents Meet Financial Markets