Sciweavers

COLING
2002

A Measure of Term Representativeness Based on the Number of Co-occurring Salient Words

13 years 4 months ago
A Measure of Term Representativeness Based on the Number of Co-occurring Salient Words
We propose a novel measure of the representativeness (i.e., indicativeness or topic specificity) of a term in a given corpus. The measure embodies the idea that the distribution of words co-occurring with a representative term should be biased according to the word distribution in the whole corpus. The bias of the word distribution in the co-occurring words is defined as the number of distinct words whose occurrences are saliently biased in the co-occurring words. The saliency of a word is defined by a threshold probability that can be automatically defined using the whole corpus. Comparative evaluation clarified that the measure is clearly superior to conventional measures in finding topic-specific words in the newspaper archives of different sizes.
Toru Hisamitsu, Yoshiki Niwa
Added 17 Dec 2010
Updated 17 Dec 2010
Type Journal
Year 2002
Where COLING
Authors Toru Hisamitsu, Yoshiki Niwa
Comments (0)