How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence
Quick Take
The study proposes a framework for auditing structural vulnerabilities in Document Layout Analysis systems.
Key Points
- Identifies Footprint Bias in robustness evaluation.
- Introduces Block-level Structural Loss Rate for analysis.
- Shifts focus from stress testing to vulnerability auditing.
📖 Reader Mode
~2 min readAbstract:Document Layout Analysis (DLA) pipelines provide structured page representations for retrieval-augmented generation, long-document question answering, and other document intelligence systems, yet their robustness evaluation remains largely area-centric. We identify this Footprint Bias and propose a lightweight output-level auditing framework that decouples probe construction, policy-driven targeting, and structure-aware diagnosis. The framework combines Block-level Structural Loss Rate (B-SLR), granularity-aware exposure descriptors, and pathway attribution to analyze where perturbations interact with layout structure and how failures propagate. Across MinerU and PP-StructureV3 on 1,000 pages, affected area weakly tracks perturbation-induced OCR instability (R^2=0.384/0.110), whereas B-SLR aligns much more closely with it (R^2=0.727/0.916). Exposure descriptors further separate occlusion- and topology-dominant pathways, and small structurally targeted probes cause downstream QA/retrieval degradation comparable to larger-footprint perturbations. These results shift DLA robustness evaluation from footprint-based stress testing toward structure-aware vulnerability auditing.
| Comments: | 19 pages, preprint |
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2605.19309 [cs.CL] |
| (or arXiv:2605.19309v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2605.19309 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Yue Chen [view email]
[v1]
Tue, 19 May 2026 03:44:09 UTC (2,917 KB)
— Originally published at arxiv.org
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.