Learning to Identify Unexpected Instances in the Test Set

10 years 7 months ago
Learning to Identify Unexpected Instances in the Test Set
Traditional classification involves building a classifier using labeled training examples from a set of predefined classes and then applying the classifier to classify test instances into the same set of classes. In practice, this paradigm can be problematic because the test data may contain instances that do not belong to any of the previously defined classes. Detecting such unexpected instances in the test set is an important issue in practice. The problem can be formulated as learning from positive and unlabeled examples (PU learning). However, current PU learning algorithms require a large proportion of negative instances in the unlabeled set to be effective. This paper proposes a novel technique to solve this problem in the text classification domain. The technique first generates a single artificial negative document AN. The sets P and {AN} are then used to build a naïve Bayesian classifier. Our experiment results show that this method is significantly better than existing tech...
Xiaoli Li, Bing Liu, See-Kiong Ng
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Authors Xiaoli Li, Bing Liu, See-Kiong Ng
Comments (0)