A Comparison of Models for Cost-Sensitive Active Learning

11 years 6 months ago
A Comparison of Models for Cost-Sensitive Active Learning
Active Learning (AL) is a selective sampling strategy which has been shown to be particularly cost-efficient by drastically reducing the amount of training data to be manually annotated. For the annotation of natural language data, cost efficiency is usually measured in terms of the number of tokens to be considered. This measure, assuming uniform costs for all tokens involved, is, from a linguistic perspective at least, intrinsically inadequate and should be replaced by a more adequate cost indicator, viz. the time it takes to manually label selected annotation examples. We here propose three different approaches to incorporate costs into the AL selection mechanism and evaluate them on the MUC7T corpus, an extension of the MUC7 newspaper corpus that contains such annotation time information. Our experiments reveal that using a costsensitive version of semi-supervised AL, up to 54% of true annotation time can be saved compared to random selection.
Katrin Tomanek, Udo Hahn
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Authors Katrin Tomanek, Udo Hahn
Comments (0)