End-to-End Text Line Detection and Ordering
Quick Take
Orli is an end-to-end model for text line detection and ordering, generating baselines in reading order from page images. It outperforms previous state-of-the-art cBAD line detection without specific training, achieving near-perfect coverage on multiple benchmarks and adapting to diverse layouts with minimal fine-tuning.
Key Points
- Orli combines line detection and reading order into a single image-to-sequence model.
- Trained on 196,691 pages across ten writing systems, it shows robust performance.
- Achieves state-of-the-art results in cBAD line detection without dataset-specific training.
- Demonstrates near-perfect ordering on multiple reading-order benchmarks zero-shot.
- Source code and model weights available under an open license on GitHub.
Article Content
From source RSS / original summaryarXiv:2606. 04166v1 Announce Type: new Abstract: Practical text-recognition pipelines for historical documents typically decompose layout analysis into line detection followed by a separate reading-order step, with the latter most often handled by a hand-coded geometric heuristic that struggles with marginalia, multiple columns, tables, and source-specific editorial conventions.
This article introduces Orli (Ordered Regression of Lines), an end-to-end model that casts both sub-tasks as a single image-to-sequence problem: from a page image, Orli autoregressively generates text-line baselines directly in reading order. Baselines are represented in a chord-frame parameterization that anchors a line's position, orientation, and extent while encoding local geometry through perpendicular offsets; an iterative refinement head and a local visual refiner produce the final curve.
Trained on a heterogeneous corpus of 196,691 pages spanning ten writing systems, Orli marginally exceeds the previously reported state of the art for cBAD line detection without dataset-specific training, reaches near perfect coverage and ordering on multiple reading-order benchmarks zero-shot, and adapts to more specialized out-of-domain layouts with limited fine-tuning. The method's source code and model weights are available under an open license at https://github. com/mittagessen/orli.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Optimal Transport Flow Matching by Design
The study presents a novel approach to optimal transport (OT) flow matching, reformulating the problem by treating the prior as a design choice. This method achieves over 2x reduction in trajectory curvature compared to existing methods, improving generation quality in few-step regimes without altering the flow model. The approach integrates seamlessly with latent-space models and classifier-free guidance.