Identifying topics and concepts associated with a set of documents is a task common to many applications. It can help in the annotation and categorization of documents and be used...
Writer independent handwriting recognition systems are limited in their accuracy, primarily due the large variations in writing styles of most characters. Samples from a single ch...
—The goal of this work is to add the capability to segment documents containing text, graphics, and pictures in the open source OCR engine OCRopus. To achieve this goal, OCRopusâ...
Amy Winder, Tim L. Andersen, Elisa H. Barney Smith
In this paper, we investigate structured models for document-level sentiment classification. When predicting the sentiment of a subjective document (e.g., as positive or negative)...
The quality of a statistical machine translation (SMT) system is heavily dependent upon the amount of parallel sentences used in training. In recent years, there have been several...