This paper describes experiments to establish the performance of a named entity recognition system which builds categorized lists of names from manually annotated training data. N...
In this paper we study the effectiveness of using a phrase-based representation in e-mail classification, and the affect this approach has on a number of machine learning algorithm...
In this paper, we present the AutoCat system for product classification. AutoCat uses a vector space model, modified to consider product attributes unavailable in traditional docu...
This paper studies the problem of identifying comparative sentences in text documents. The problem is related to but quite different from sentiment/opinion sentence identification...
ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...