Information extraction by finding repeated structure

12 years 3 months ago
Information extraction by finding repeated structure
Repetition of layout structure is prevalent in document images. In document design, such repetition conveys the underlying logical and functional structure of the data. For example, in invoices, the names, unit prices, quantities and other descriptors of every line item are laid out in a consistent spatial structure. We propose a general method for extracting such repeated structure from documents. After receiving a single example of the structure to be found, the proposed method localizes additional instances of this structure in the same document and in additional documents. A wide variety of perceptually motivated cues (such as alignment and saliency) is used for this purpose. These cues are combined in a probabilistic model, and a novel algorithm for exact inference in this model is proposed and used. We demonstrate that this method can cope with complex instances of repeated structure and generalizes successfully across a wide range of structure variations. Categories and Subject...
Evgeniy Bart, Prateek Sarkar
Added 10 Feb 2011
Updated 10 Feb 2011
Type Journal
Year 2010
Where DAS
Authors Evgeniy Bart, Prateek Sarkar
Comments (0)