Sciweavers

COLING
1996

Good Bigrams

13 years 5 months ago
Good Bigrams
A desired property of a measure of connective strength in bigrams is that the measure should be insensitive to corpus size. This paper investigates the stability of three different measures over text genres and expansion of the corpus. The measures are (1) the commonly used mutual information, (2) the difference in mutual information, and (3) raw occurrence. Mutual information is further compared to using knowledge about genres to remove overlap between genres. This last approach considers the difference between two products of the same process (human text-generation) constrained by different genres. The cancellation of overlap seems to provide the most specific word pairs for each genre.
Christer Johansson
Added 02 Nov 2010
Updated 02 Nov 2010
Type Conference
Year 1996
Where COLING
Authors Christer Johansson
Comments (0)