Sciweavers

NAACL
2004

Multiple Similarity Measures and Source-Pair Information in Story Link Detection

13 years 6 months ago
Multiple Similarity Measures and Source-Pair Information in Story Link Detection
State-of-the-art story link detection systems, that is, systems that determine whether two stories are about the same event or linked, are usually based on the cosine-similarity measured between two stories. This paper presents a method for improving the performanceof a link detection system by using a variety of similarity measures and using source-pair specific statistical information. The utility of a number of different similarity measures, including cosine, Hellinger, Tanimoto, and clarity, both alone and in combination, was investigated. We also compared several machine learning techniques for combining the different types of information. The techniques investigated were SVMs, voting, and decision trees, each of which makes use of similarity and statistical information differently. Our experimental results indicate that the combination of similarity measures and source-pair specific statistical information using an SVM provides the largest improvement in estimating whether two s...
Francine Chen, Ayman Farahat, Thorsten Brants
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2004
Where NAACL
Authors Francine Chen, Ayman Farahat, Thorsten Brants
Comments (0)