Sciweavers

KDD
2008
ACM

Semi-supervised approach to rapid and reliable labeling of large data sets

14 years 11 months ago
Semi-supervised approach to rapid and reliable labeling of large data sets
Supervised classification methods have been shown to be very effective for a large number of applications. They require a training data set whose instances are labeled to indicate the correct class assignment. In many rapidly changing fields, like computer network traffic analysis, the availability of up-to-date labeled data sets is very limited. This is primarily a consequence of the excessively high cost of an expert manually labeling these large data sets. In this research, we propose a method, where the labeling of the data set is carried out in a semi-supervised manner with userspecified guarantees about the quality of the labeling. In our scheme, we assume that for each class, we have some heuristics available, each of which can identify instances of one particular class. The heuristics are assumed to have reasonable performance but they do not need to cover all instances of the class nor do they need to be perfectly reliable. We further assume that we have an infallible expert,...
György J. Simon, Vipin Kumar, Zhi-Li Zhang
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2008
Where KDD
Authors György J. Simon, Vipin Kumar, Zhi-Li Zhang
Comments (0)