We consider a model for which it is important, early in processing, to estimate some variables with high precision, but perhaps at relatively low recall. If some variables can be ...
Gary B. Huang, Andrew Kae, Carl Doersch, Erik G. L...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not scanned at high resolution. Many current approaches rely on stored font models...
Andrew Kae, Gary Huang, Erik Learned-miller, Carl ...
A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...