Sciweavers

CICLING
2005
Springer

Name Discrimination by Clustering Similar Contexts

13 years 10 months ago
Name Discrimination by Clustering Similar Contexts
It is relatively common for different people or organizations to share the same name. Given the increasing amount of information available online, this results in the ever growing possibility of finding misleading or incorrect information due to confusion caused by an ambiguous name. This paper presents an unsupervised approach that resolves name ambiguity by clustering the instances of a given name into groups, each of which is associated with a distinct underlying entity. The features we employ to represent the context of an ambiguous name are statistically significant bigrams that occur in the same context as the ambiguous name. From these features we create a co–occurrence matrix where the rows and columns represent the first and second words in bigrams, and the cells contain their log–likelihood scores. Then we represent each of the contexts in which an ambiguous name appears with a second order context vector. This is created by taking the average of the vectors from the ...
Ted Pedersen, Amruta Purandare, Anagha Kulkarni
Added 26 Jun 2010
Updated 26 Jun 2010
Type Conference
Year 2005
Where CICLING
Authors Ted Pedersen, Amruta Purandare, Anagha Kulkarni
Comments (0)