Sciweavers

ICML
2005
IEEE

A model for handling approximate, noisy or incomplete labeling in text classification

14 years 5 months ago
A model for handling approximate, noisy or incomplete labeling in text classification
We introduce a Bayesian model, BayesANIL, that is capable of estimating uncertainties associated with the labeling process. Given a labeled or partially labeled training corpus of text documents, the model estimates the joint distribution of training documents and class labels by using a generalization of the Expectation Maximization algorithm. The estimates can be used in standard classification models to reduce error rates. Since uncertainties in the labeling are taken into account, the model provides an elegant mechanism to deal with noisy labels. We provide an intuitive modification to the EM iterations by re-estimating the empirical distribution in order to reinforce feature values in unlabeled data and to reduce the influence of noisily labeled examples. Considerable improvement in the classification accuracies of two popular classification algorithms on standard labeled data-sets with and without artificially introduced noise, as well as in the presence and absence of unlabeled...
Ganesh Ramakrishnan, Krishna Prasad Chitrapura, Ra
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2005
Where ICML
Authors Ganesh Ramakrishnan, Krishna Prasad Chitrapura, Raghu Krishnapuram, Pushpak Bhattacharyya
Comments (0)