How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence

arXiv cs.CL·Yue Chen, Yihao Wang, Ziyi Tang, Keze Wang

5/20/2026

·~2 min·5/20/2026·en·1

Quick Answer

This study introduces a lightweight auditing framework for Document Layout Analysis (DLA) that evaluates structural vulnerabilities in document intelligence systems.

Quick Take

This study introduces a lightweight auditing framework for Document Layout Analysis (DLA) that evaluates structural vulnerabilities in document intelligence systems. By utilizing Block-level Structural Loss Rate (B-SLR) and exposure descriptors, the framework reveals that small structural probes can cause significant degradation in QA and retrieval performance, shifting the focus from footprint-based testing to structure-aware auditing.

Key Points

Proposed a framework combining B-SLR and exposure descriptors for auditing DLA robustness.
B-SLR correlates more closely with OCR instability than affected area metrics.
Small structural probes can degrade QA/retrieval performance comparably to larger perturbations.
Study analyzed 1,000 pages across MinerU and PP-StructureV3 datasets.
Shifts evaluation focus from footprint-based stress testing to structure-aware auditing.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 19 May 2026]

View PDF HTML (experimental)

Abstract:Document Layout Analysis (DLA) pipelines provide structured page representations for retrieval-augmented generation, long-document question answering, and other document intelligence systems, yet their robustness evaluation remains largely area-centric. We identify this Footprint Bias and propose a lightweight output-level auditing framework that decouples probe construction, policy-driven targeting, and structure-aware diagnosis. The framework combines Block-level Structural Loss Rate (B-SLR), granularity-aware exposure descriptors, and pathway attribution to analyze where perturbations interact with layout structure and how failures propagate. Across MinerU and PP-StructureV3 on 1,000 pages, affected area weakly tracks perturbation-induced OCR instability (R^2=0.384/0.110), whereas B-SLR aligns much more closely with it (R^2=0.727/0.916). Exposure descriptors further separate occlusion- and topology-dominant pathways, and small structurally targeted probes cause downstream QA/retrieval degradation comparable to larger-footprint perturbations. These results shift DLA robustness evaluation from footprint-based stress testing toward structure-aware vulnerability auditing.

Comments:	19 pages, preprint
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2605.19309 [cs.CL]
	(or arXiv:2605.19309v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.19309 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yue Chen [view email]
[v1] Tue, 19 May 2026 03:44:09 UTC (2,917 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

1w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quantifying Prior Dominance in Systems