Models of latent document semantics such as the mixture of multinomials model and Latent Dirichlet Allocation have received substantial attention for their ability to discover top...
Daniel David Walker, William B. Lund, Eric K. Ring...
In this paper, stochastic error-correcting parsing is proposed as a powerful and flexible method to post-process the results of an optical character recognizer (OCR). Determinist...
The word error rate of any optical character recognition system (OCR) is usually substantially below its component or character error rate. This is especially true of Indic langua...
Venkat Rasagna, Anand Kumar 0002, C. V. Jawahar, R...
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing f...
This paper proposes an OCR post-processing approach based on multi-knowledge, which integrates language knowledge and candidate distance information given by the OCR engine. In thi...