Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

100

COLING
2010

favoriteEmaildiscussreport

126views Computational Linguistics» more COLING 2010»

Improving Name Origin Recognition with Context Features and Unlabelled Data

14 years 7 months ago

Improving Name Origin Recognition with Context Features and Unlabelled Data

Download acl.eldoc.ub.rug.nl

We demonstrate the use of context features, namely, names of places, and unlabelled data for the detection of personal name language of origin. While some early work used either rule-based methods or n-gram statistical models to determine the name language of origin, we use the discriminative classification maximum entropy model and view the task as a classification task. We perform bootstrapping of the learning using list of names out of context but with known origin and then using expectation-maximisation algorithm to further train the model on a large corpus of names of unknown origin but with context features. Using a relatively small unlabelled corpus we improve the accuracy of name origin recognition for names written in Chinese from 82.7% to 85.8%, a significant reduction in the error rate. The improvement in F-score for infrequent Japanese names is even greater: from 77.4% without context features to 82.8% with context features.

Vladimir Pervouchine, Min Zhang, Ming Liu, Haizhou

Real-time Traffic

COLING 2010 | Computational Linguistics | Context Features | Discriminative Classification Maximum | Infrequent Japanese Names |

claim paper

Related Content

» Korean named entity recognition using HMM and CoTraining model

» Combining labeled and unlabeled data with wordclass distribution learning

» Minimum Bayes Error Features for Visual Recognition by Sequential Feature Selection and Ex...

» Directional features in online handwriting recognition

» Semisupervised Bionamed Entity Recognition with WordCodebook Learning

» SemiSupervised Sequence Labeling with SelfLearned Features

» A multistream ASR framework for BLSTM modeling of conversational speech

» A hybrid approach for generating secure and discriminating face template

» Multigraph enabled active learning for multimodal web image retrieval

Post Info
More Details (n/a)

Added	13 May 2011
Updated	13 May 2011
Type	Journal
Year	2010
Where	COLING
Authors	Vladimir Pervouchine, Min Zhang, Ming Liu, Haizhou Li

Comments (0)