Sciweavers

ICCV
2005
IEEE

Learning Non-Generative Grammatical Models for Document Analysis

13 years 9 months ago
Learning Non-Generative Grammatical Models for Document Analysis
— We present a general approach for the hierarchical segmentation and labeling of document layout structures. This approach models document layout as a grammar and performs a global search for the optimal parse based on a grammatical cost function. Our contribution is to utilize machine learning to discriminatively select features and set all parameters in the parsing process. Therefore, and unlike many other approaches for layout analysis, ours can easily adapt itself to a variety of document analysis problems. One need only specify the page grammar and provide a set of correctly labeled pages. Experiments demonstrate the effectiveness of this technique on two document image analysis tasks: page layout structure extraction and mathematical expression interpretation. Experiments demonstrate that the learned grammars can be used to extract the document structure in 57 files from the UWIII document image database. A second set of experiments demonstrate that the same framework can be...
Michael Shilman, Percy Liang, Paul A. Viola
Added 24 Jun 2010
Updated 24 Jun 2010
Type Conference
Year 2005
Where ICCV
Authors Michael Shilman, Percy Liang, Paul A. Viola
Comments (0)