Learning to Classify Text from Labeled and Unlabeled Documents

15 years 28 days ago

Download www.kamalnigam.com

In many important text classification problems, acquiring class labels for training documents is costly, while gathering large quantities of unlabeled data is cheap. This paper shows that the accuracy of text classifiers trained with a small number of labeled documents can be improved by augmenting this small training set with a large pool of unlabeled documents. We present a theoretical argument showing that, under common assumptions, unlabeled data contain information about the target function. We then introduce an algorithm for learning from labeled and unlabeled text based on the combination of Expectation-Maximization with a naive Bayes classifier. The algorithm first trains a classifier using the available labeled documents, and probabilistically labels the unlabeled documents; it then trains a new classifier using the labels for all the documents, and iterates to convergence. Experimental results, obtained using text from three different realworld tasks, show that the use of un...

Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom

Real-time Traffic

AAAI 1998 | Available Labeled Documents | Intelligent Agents | Unlabeled Data | Unlabeled Documents |

claim paper

» Text Classification from Labeled and Unlabeled Documents using EM

» Employing EM and PoolBased Active Learning for Text Classification

» Active Learning Strategies for MultiLabel Text Classification

» Mining Relevant Text from Unlabelled Documents

» Crosstraining learning probabilistic mappings between topics

» Learning from labeled features using generalized expectation criteria

» Semisupervised Text Classification Using Partitioned EM

» Text Classification by Labeling Words

Post Info
More Details (n/a)

Added	01 Nov 2010
Updated	01 Nov 2010
Type	Conference
Year	1998
Where	AAAI
Authors	Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom M. Mitchell

Comments (0)

Sciweavers

Learning to Classify Text from Labeled and Unlabeled Documents

AAAI 1998 | Available Labeled Documents | Intelligent Agents | Unlabeled Data | Unlabeled Documents |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers