Sciweavers

KDD
2009
ACM

Efficiently learning the accuracy of labeling sources for selective sampling

14 years 4 months ago
Efficiently learning the accuracy of labeling sources for selective sampling
Many scalable data mining tasks rely on active learning to provide the most useful accurately labeled instances. However, what if there are multiple labeling sources (`oracles' or `experts') with different but unknown reliabilities? With the recent advent of inexpensive and scalable online annotation tools, such as Amazon's Mechanical Turk, the labeling process has become more vulnerable to noise - and without prior knowledge of the accuracy of each individual labeler. This paper addresses exactly such a challenge: how to jointly learn the accuracy of labeling sources and obtain the most informative labels for the active learning task at hand minimizing total labeling effort. More specifically, we present IEThresh (Interval Estimate Threshold) as a strategy to intelligently select the expert(s) with the highest estimated labeling accuracy. IEThresh estimates a confidence interval for the reliability of each expert and filters out the one(s) whose estimated upper-bound c...
Pinar Donmez, Jaime G. Carbonell, Jeff Schneider
Added 25 Nov 2009
Updated 25 Nov 2009
Type Conference
Year 2009
Where KDD
Authors Pinar Donmez, Jaime G. Carbonell, Jeff Schneider
Comments (0)