How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence
Quick Answer
This study introduces a lightweight auditing framework for Document Layout Analysis (DLA) that evaluates structural vulnerabilities in document intelligence systems.
Quick Take
This study introduces a lightweight auditing framework for Document Layout Analysis (DLA) that evaluates structural vulnerabilities in document intelligence systems. By utilizing Block-level Structural Loss Rate (B-SLR) and exposure descriptors, the framework reveals that small structural probes can cause significant degradation in QA and retrieval performance, shifting the focus from footprint-based testing to structure-aware auditing.
Key Points
- Proposed a framework combining B-SLR and exposure descriptors for auditing DLA robustness.
- B-SLR correlates more closely with OCR instability than affected area metrics.
- Small structural probes can degrade QA/retrieval performance comparably to larger perturbations.
- Study analyzed 1,000 pages across MinerU and PP-StructureV3 datasets.
- Shifts evaluation focus from footprint-based stress testing to structure-aware auditing.
Paper Resources
📖 Reader Mode
~2 min readAbstract:Document Layout Analysis (DLA) pipelines provide structured page representations for retrieval-augmented generation, long-document question answering, and other document intelligence systems, yet their robustness evaluation remains largely area-centric. We identify this Footprint Bias and propose a lightweight output-level auditing framework that decouples probe construction, policy-driven targeting, and structure-aware diagnosis. The framework combines Block-level Structural Loss Rate (B-SLR), granularity-aware exposure descriptors, and pathway attribution to analyze where perturbations interact with layout structure and how failures propagate. Across MinerU and PP-StructureV3 on 1,000 pages, affected area weakly tracks perturbation-induced OCR instability (R^2=0.384/0.110), whereas B-SLR aligns much more closely with it (R^2=0.727/0.916). Exposure descriptors further separate occlusion- and topology-dominant pathways, and small structurally targeted probes cause downstream QA/retrieval degradation comparable to larger-footprint perturbations. These results shift DLA robustness evaluation from footprint-based stress testing toward structure-aware vulnerability auditing.
| Comments: | 19 pages, preprint |
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2605.19309 [cs.CL] |
| (or arXiv:2605.19309v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2605.19309 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Yue Chen [view email]
[v1]
Tue, 19 May 2026 03:44:09 UTC (2,917 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Quantifying Prior Dominance in Systems
The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.