Sciweavers

CIKM
2009
Springer

Combining labeled and unlabeled data with word-class distribution learning

13 years 11 months ago
Combining labeled and unlabeled data with word-class distribution learning
We describe a novel simple and highly scalable semi-supervised method called Word-Class Distribution Learning (WCDL), and apply it the task of information extraction (IE) by utilizing unlabeled sentences to improve supervised classification methods. WCDL iteratively builds class label distributions for each word in the dictionary by averaging predicted labels over all cases in the unlabeled corpus, and re-training a base classifier adding these distributions as word features. In contrast, traditional self-training or cotraining methods add self-labeled examples (rather than features) which can degrade performance due to incestuous learning bias. WCDL exhibits robust behavior, and has no difficult parameters to tune. We applied our method on German and English name entity recognition (NER) tasks. WCDL shows improvements over self-training, multi-task semi-supervision or supervision alone, in particular yielding a state-of-the art 75.72 F1 score on the German NER task. Categories and...
Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray Kav
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where CIKM
Authors Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray Kavukcuoglu, Jason Weston
Comments (0)