This paper presents a new enhanced text extraction algorithm from degraded document images on the basis of the probabilistic models. The observed document image is considered as a...
Recognition and encoding of digitized historical documents is still a challenging and difficult task. A major problem is the occurrence of unknown glyphs and symbols which might n...
This paper proposes a novel dewarping technique for document images of bound volumes. This technique is a kind of model fitting techniques for estimating the warp of each text li...
We present a document understanding system in which the arrangement of lines of text and block separators within a document are modeled by stochastic context free grammars. A gram...
John C. Handley, Anoop M. Namboodiri, Richard Zani...
A spanned cell in a table is a single, complete unit that physically occupies multiple columns and/or multiple rows. Spanned cells are common in tables, and they are a significan...