Learning to Classify Texts Using Positive and Unlabeled Data

13 years 6 months ago
Learning to Classify Texts Using Positive and Unlabeled Data
In traditional text classification, a classifier is built using labeled training documents of every class. This paper studies a different problem. Given a set P of documents of a particular class (called positive class) and a set U of unlabeled documents that contains documents from class P and also other types of documents (called negative class documents), we want to build a classifier to classify the documents in U into documents from P and documents not from P. The key feature of this problem is that there is no labeled negative document, which makes traditional text classification techniques inapplicable. In this paper, we propose an effective technique to solve the problem. It combines the Rocchio method and the SVM technique for classifier building. Experimental results show that the new method outperforms existing methods significantly.
Xiaoli Li, Bing Liu
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Authors Xiaoli Li, Bing Liu
Comments (0)