Sciweavers

KCAP
2009
ACM

Reducing class imbalance during active learning for named entity annotation

13 years 11 months ago
Reducing class imbalance during active learning for named entity annotation
In lots of natural language processing tasks, the classes to be dealt with often occur heavily imbalanced in the underlying data set and classifiers trained on such skewed data tend to exhibit poor performance for low-frequency classes. We introduce and compare different approaches to reduce class imbalance by design within the context of active learning (AL). Our goal is to compile more balanced data sets up front during annotation time when AL is used as a strategy to acquire training material. We situate our approach in the context of named entity recognition. Our experiments reveal that we can indeed reduce class imbalance and increase the performance of classifiers on minority classes while preserving a good overall performance in terms of macro F-score. Categories and Subject Descriptors I.2.6 [Computing Methodologies]: Artificial Intelligence— Learning; I.2.7 [Computing Methodologies]: Artificial Intelligence—Natural Language Processing General Terms Algorithms, Design,...
Katrin Tomanek, Udo Hahn
Added 28 May 2010
Updated 28 May 2010
Type Conference
Year 2009
Where KCAP
Authors Katrin Tomanek, Udo Hahn
Comments (0)