Sciweavers

NAACL
2003

Frequency Estimates for Statistical Word Similarity Measures

13 years 6 months ago
Frequency Estimates for Statistical Word Similarity Measures
Statistical measures of word similarity have application in many areas of natural language processing, such as language modeling and information retrieval. We report a comparative study of two methods for estimating word cooccurrence frequencies required by word similarity measures. Our frequency estimates are generated from a terabyte-sized corpus of Web data, and we study the impact of corpus size on the effectiveness of the measures. We base the evaluation on one TOEFL question set and two practice questions sets, each consisting of a number of multiple choice questions seeking the best synonym for a given target word. For two question sets, a context for the target word is provided, and we examine a number of word similarity measures that exploit this context. Our best combination of similarity measure and frequency estimation method answers 6-8% more questions than the best results previously reported for the same question sets.
Egidio L. Terra, Charles L. A. Clarke
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Where NAACL
Authors Egidio L. Terra, Charles L. A. Clarke
Comments (0)