Sciweavers

ICML
2002
IEEE

Combining Labeled and Unlabeled Data for MultiClass Text Categorization

15 years 10 months ago
Combining Labeled and Unlabeled Data for MultiClass Text Categorization
Supervised learning techniques for text classi cation often require a large number of labeled examples to learn accurately. One way to reduce the amountoflabeled datarequired is to develop algorithms that can learn e ectively from a small number of labeled examples augmented with a large number of unlabeled examples. Current text learning techniques for combining labeled and unlabeled, such as EM and Co-Training, are mostly applicable for classi cation tasks with a small number of classes and do not scale up well for large multiclass problems. In this paper, we develop a framework to incorporate unlabeled data in the Error-Correcting Output Coding (ECOC) setup by rst decomposing multiclass problems into multiple binary problemsand then using Co-Trainingto learn the individual binary classi cation problems. We show that our method is especially useful for text classi cation tasks involving a large number of categories and outperforms other semi-supervised learning techniques such as EM...
Rayid Ghani
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2002
Where ICML
Authors Rayid Ghani
Comments (0)