Sciweavers

WWW
2006
ACM

Large-scale text categorization by batch mode active learning

14 years 5 months ago
Large-scale text categorization by batch mode active learning
Large-scale text categorization is an important research topic for Web data mining. One of the challenges in large-scale text categorization is how to reduce the amount of human efforts in labeling text documents for building reliable classification models. In the past, there have been many studies on applying active learning methods to automatic text categorization, which try to select the most informative documents for manually labeling. Most of these studies focused on selecting a single unlabeled document in each iteration. As a result, the text categorization model has to be retrained after each labeled document is solicited. In this paper, we present a novel active learning algorithm that selects a batch of text documents for manually labeling in each iteration. The key of the batch mode active learning is how to reduce the redundancy among the selected examples such that each example provides unique information for model updating. To this end, we use the Fisher information matr...
Steven C. H. Hoi, Rong Jin, Michael R. Lyu
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2006
Where WWW
Authors Steven C. H. Hoi, Rong Jin, Michael R. Lyu
Comments (0)