Sciweavers

DAS
2010
Springer

Analysis of whole-book recognition

13 years 9 months ago
Analysis of whole-book recognition
Whole-book recognition is a document image analysis strategy that operates on the complete set of a book’s page images, attempting to improve accuracy by automatic unsupervised adaptation. Our algorithm expects to be given initial iconic and linguistic models—derived from (generally errorful) OCR results and (generally incomplete) dictionaries— and then, guided entirely by evidence internal to the test set, the algorithm corrects the models yielding improved accuracy. We have found that successful corrections are often closely associated with “disagreements” between the models which can be detected within the test set by measuring cross entropy between (a) the posterior probability distribution of character classes (the recognition results from image classification alone), and (b) the posterior probability distribution of word classes (the recognition results from image classification combined with linguistic constraints). We report experiments on long passages (up to 180 ...
Pingping Xiu, Henry S. Baird
Added 19 Jul 2010
Updated 19 Jul 2010
Type Conference
Year 2010
Where DAS
Authors Pingping Xiu, Henry S. Baird
Comments (0)