We address the problem of clustering words (or constructing a thesaurus) based on co-occurrence data, and using the acquired word classes to improve the accuracy of syntactic disa...
A web search with double checking model is proposed to explore the web as a live corpus. Five association measures including variants of Dice, Overlap Ratio, Jaccard, and Cosine, ...
Given the recent trend to evaluate the performance of word sense disambiguation systems in a more application-oriented set-up, we report on the construction of a multilingual benc...
This paper introduces a novel statistical mixture model for probabilistic clustering of histogram data and, more generally, for the analysis of discrete co occurrence data. Adoptin...
Word clustering is important for automatic thesaurus construction, text classification, and word sense disambiguation. Recently, several studies have reported using the web as a c...
Yutaka Matsuo, Takeshi Sakaki, Koki Uchiyama, Mits...