Sciweavers

ACL
2008

Word Clustering and Word Selection Based Feature Reduction for MaxEnt Based Hindi NER

13 years 6 months ago
Word Clustering and Word Selection Based Feature Reduction for MaxEnt Based Hindi NER
Statistical machine learning methods are employed to train a Named Entity Recognizer from annotated data. Methods like Maximum Entropy and Conditional Random Fields make use of features for the training purpose. These methods tend to overfit when the available training corpus is limited especially if the number of features is large or the number of values for a feature is large. To overcome this we proposed two techniques for feature reduction based on word clustering and selection. A number of word similarity measures are proposed for clustering words for the Named Entity Recognition task. A few corpus based statistical measures are used for important word selection. The feature reduction techniques lead to a substantial performance improvement over baseline Maximum Entropy technique.
Sujan Kumar Saha, Pabitra Mitra, Sudeshna Sarkar
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where ACL
Authors Sujan Kumar Saha, Pabitra Mitra, Sudeshna Sarkar
Comments (0)