Sciweavers

KDD
2008
ACM

Get another label? improving data quality and data mining using multiple, noisy labelers

14 years 4 months ago
Get another label? improving data quality and data mining using multiple, noisy labelers
This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality and model quality, but not always. (ii) When labels are noisy, repeated labeling can be preferable to single labeling even in the traditional setting where labels are not particularly cheap. (iii) As soon as the cost of processing the unlabeled data is not free, even the simpl...
Victor S. Sheng, Foster J. Provost, Panagiotis G.
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2008
Where KDD
Authors Victor S. Sheng, Foster J. Provost, Panagiotis G. Ipeirotis
Comments (0)