Sciweavers

ACL
1993

Contextual Word Similarity and Estimation from Sparse Data

13 years 5 months ago
Contextual Word Similarity and Estimation from Sparse Data
In recent years there is much interest in word cooccurrence relations, such as n-grams, verb-object combinations, or cooccurrence within a limited context. This paper discusses how to estimate the likelihood of cooccurrences that do not occur in the training data. We present a method that makes local analogies between each speci c unobserved cooccurrence and other cooccurrences that contain similar words. These analogies are based on the assumption that similar word cooccurrences have similar values of mutual information. Accordingly, the word similarity metric captures similarities between vectors of mutual information values. Our evaluation suggests that this method performs better than existing, frequency based, smoothing methods, and may provide an alternative to class based models. A background survey is included, covering issues of lexical cooccurrence, data sparseness and smoothing, word similarity and clustering, and mutual information. 1
Ido Dagan, Shaul Marcus, Shaul Markovitch
Added 02 Nov 2010
Updated 02 Nov 2010
Type Conference
Year 1993
Where ACL
Authors Ido Dagan, Shaul Marcus, Shaul Markovitch
Comments (0)