Sciweavers

354 search results - page 2 / 71
» Topic based language models for OCR correction
Sort
View
EMNLP
2010
13 years 3 months ago
Evaluating Models of Latent Document Semantics in the Presence of OCR Errors
Models of latent document semantics such as the mixture of multinomials model and Latent Dirichlet Allocation have received substantial attention for their ability to discover top...
Daniel David Walker, William B. Lund, Eric K. Ring...
ICPR
2000
IEEE
13 years 10 months ago
Stochastic Error-Correcting Parsing for OCR Post-Processing
In this paper, stochastic error-correcting parsing is proposed as a powerful and flexible method to post-process the results of an optical character recognizer (OCR). Determinist...
Juan Carlos Pérez-Cortes, Juan-Carlos Ameng...
ICDAR
2009
IEEE
14 years 15 days ago
Robust Recognition of Documents by Fusing Results of Word Clusters
The word error rate of any optical character recognition system (OCR) is usually substantially below its component or character error rate. This is especially true of Indic langua...
Venkat Rasagna, Anand Kumar 0002, C. V. Jawahar, R...
NAACL
2003
13 years 7 months ago
A Generative Probabilistic OCR Model for NLP Applications
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing f...
Okan Kolak, William J. Byrne, Philip Resnik
KES
2005
Springer
13 years 11 months ago
An OCR Post-processing Approach Based on Multi-knowledge
This paper proposes an OCR post-processing approach based on multi-knowledge, which integrates language knowledge and candidate distance information given by the OCR engine. In thi...
Li Zhuang, Xiaoyan Zhu