Learning on the Fly: Font-Free Approaches to Difficult OCR Problems

13 years 2 months ago

Download www.cs.umass.edu

Despite ubiquitous claims that optical character recognition (OCR) is a "solved problem," many categories of documents continue to break modern OCR software such as documents with moderate degradation or unusual fonts. Many approaches rely on pre-computed or stored character models, but these are vulnerable to cases when the font of a particular document was not part of the training set, or when there is so much noise in a document that the font model becomes weak. To address these difficult cases, we present a form of iterative contextual modeling that learns character models directly from the document it is trying to recognize. We use these learned models both to segment the characters and to recognize them in an incremental, iterative process. We present results comparable to those of a commercial OCR system on a subset of characters from a difficult test document.

Andrew Kae, Erik G. Learned-Miller

Real-time Traffic

Character Models | Document | Document Analysis | Font | ICDAR 2009 |

claim paper

Added	18 Feb 2011
Updated	18 Feb 2011
Type	Journal
Year	2009
Where	ICDAR
Authors	Andrew Kae, Erik G. Learned-Miller

Sciweavers

Learning on the Fly: Font-Free Approaches to Difficult OCR Problems

Character Models | Document | Document Analysis | Font | ICDAR 2009 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers