Active cleaning of label noise

11 months 21 days ago
Active cleaning of label noise
Mislabeled examples in the training data can severely affect the performance of supervised classifiers. In this paper, we present an approach to remove any mislabeled examples in the dataset by selecting suspicious examples as targets for inspection. We show that the large margin and soft margin principles used in support vector machines (SVM) have the characteristic of capturing the mislabeled examples as support vectors. Experimental results on two character recognition datasets show that one-class and two-class SVMs are able to capture around 85% and 99% of label noise examples, respectively, as their support vectors. We propose another new method that iteratively builds two-class SVM classifiers on the non-support vector examples from the training data followed by an expert manually verifying the support vectors based on their classification score to identify any mislabeled examples. We show that this method reduces the number of examples to be reviewed, as well as the paramet...
Ekambaram Rajmadhan, Sergiy Fefilatyev, Matthew Sh
Added 09 Apr 2016
Updated 09 Apr 2016
Type Journal
Year 2016
Where PR
Authors Ekambaram Rajmadhan, Sergiy Fefilatyev, Matthew Shreve, Kurt Kramer, Lawrence O. Hall, Dmitry B. Goldgof, Rangachar Kasturi
Comments (0)