Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production
Quick Take
The paper presents a microservice architecture for efficient document AI pipelines in production environments.
Key Points
- Focuses on bridging model development and production deployment.
- Highlights the importance of OCR in latency over language models.
- Suggests architectural patterns for scalable document understanding systems.
📖 Reader Mode
~2 min readAuthors:Yao Fehlis, Benjamin Bengfort, Zhangzhang Si, Vahid Eyorokon, Prema Roman, Patrick Deziel, Devon Slonaker, Steve Veldman, Ben Johnson, Joyce Rigelo, Michael Wharton, Steve Kramer
Abstract:Academic research tends to focus on new models for document understanding creating a wide gap in the literature between model definition and running models at production scale. To close that gap, we present a microservice architecture that encapsulates pipelines of multiple models for classification, optical character recognition (OCR), and large language model structured field extraction as well as our experience running this pipeline on thousands of multi-page documents per hour. We describe our primary design decisions, including a hybrid classification, separation of GPU-bound inference from CPU-bound orchestration, use of asynchronous processing for the many IO-bound operations in the pipeline, and an independent, horizontal scaling strategy. Using batch profiling, we identified two surprising qualitative findings that shape production deployments: OCR, not language-model parsing, dominates end-to-end latency, and the system saturates at a concurrency determined by shared GPU-inference capacity rather than worker count. Our goal is to provide practitioners with concrete architectural patterns for building document understanding systems that work beyond the benchmark; effectively operationalizing models in production.
| Subjects: | Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE) |
| Cite as: | arXiv:2605.18818 [cs.AI] |
| (or arXiv:2605.18818v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.18818 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Yao Fehlis [view email]
[v1]
Tue, 12 May 2026 13:07:34 UTC (20 KB)
— Originally published at arxiv.org
More from arXiv cs.AI
See more →From Prompts to Protocols: An AI Agent for Laboratory Automation
An AI agent integrates large language models for automating laboratory protocols, enhancing efficiency and accuracy.