Sciweavers

LREC
2010

A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters

13 years 5 months ago
A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters
Named Entity Recognition is a relatively well-understood NLP task, with many publicly available training resources and software for English. Other languages tend to be underserved in this area. For German, CoNLL-2003 provides training data, but there are no publicly available, ready-to-use tools. We fill this gap and develop a German NER system with state-of-the-art performance. In addition to CoNLL 2003 labeled training data, we use two additional resources: (i) 32 million words of unlabeled text and (ii) infobox labels in German Wikipedia articles. We extract informative features of word-types from those resources and train a supervised model on the labeled training data. This approach allows us to deal better with word-types unseen in the training data and achieve state-of-the-art performance on German with little engineering effort.
Grzegorz Chrupala, Dietrich Klakow
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where LREC
Authors Grzegorz Chrupala, Dietrich Klakow
Comments (0)