One challenge in text processing is the treatment of case insensitive documents such as speech recognition results. The traditional approach is to re-train a language model exclud...
Cheng Niu, Wei Li 0003, Jihong Ding, Rohini K. Sri...
Amharic is the official language of Ethiopia and uses Ethiopic script for writing. In this paper, we present writer-independent HMM-based Amharic word recognition for offline hand...
We present a class of richly structured, undirected hidden variable models suitable for simultaneously modeling text along with other attributes encoded in different modalities. O...
We present a new image compression technique called DjVu" that is speci cally geared towards the compression of scanned documents in color at high resolution. With DjVu, a ma...
The ability to find tables and extract information from them is a necessary component of data mining, question answering, and other information retrieval tasks. Documents often c...
David Pinto, Andrew McCallum, Xing Wei, W. Bruce C...