Sciweavers

SIGIR
2005
ACM

Using term informativeness for named entity detection

13 years 10 months ago
Using term informativeness for named entity detection
Informal communication (e-mail, bulletin boards) poses a difficult learning environment because traditional grammatical and lexical information are noisy. Other information is necessary for tasks such as named entity detection. How topic-centric, or informative, a word is can be valuable information. It is well known that informative words are best modeled by “heavy-tailed” distributions, such as mixture models. However, informativeness scores do not take full advantage of this fact. We introduce a new informativeness score that directly utilizes mixture model likelihood to identify informative words. We use the task of extracting restaurant names from bulletin board posts as a way to determine effectiveness. We find that our “mixture score” is weakly effective alone and highly effective when combined with Inverse Document Frequency. We compare against other informativeness criteria and find that only Residual IDF is competitive against our combined IDF/Mixture score. Cat...
Jason D. M. Rennie, Tommi Jaakkola
Added 26 Jun 2010
Updated 26 Jun 2010
Type Conference
Year 2005
Where SIGIR
Authors Jason D. M. Rennie, Tommi Jaakkola
Comments (0)