Despite ubiquitous claims that optical character recognition (OCR) is a "solved problem," many categories of documents continue to break modern OCR software such as docu...
In style-constrained classification often there are only a few samples of each style and class, and the correspondences between styles in the training set and the test set are un...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not scanned at high resolution. Many current approaches rely on stored font models...
Andrew Kae, Gary Huang, Erik Learned-miller, Carl ...
Models of latent document semantics such as the mixture of multinomials model and Latent Dirichlet Allocation have received substantial attention for their ability to discover top...
Daniel David Walker, William B. Lund, Eric K. Ring...
Prototype classifiers trained with multi-class classification objective are inferior in pattern retrieval and outlier rejection. To improve the binary classification (detection, v...