The decomposition of a document into segments such as text regions and graphics is a significant part of the document analysis process. The basic requirement for rating and impro...
—The goal of this work is to add the capability to segment documents containing text, graphics, and pictures in the open source OCR engine OCRopus. To achieve this goal, OCRopus...
Amy Winder, Tim L. Andersen, Elisa H. Barney Smith
This paper presents part of a new DIA performance analysis framework aimed at Layout Analysis algorithm developers. A new region-representation scheme (an interval-based descripti...
Abstract. This paper presents a quantitative comparison of six algorithms for page segmentation: X-Y cut, smearing, whitespace analysis, constrained text-line finding, Docstrum, an...