Sciweavers

CICLING
2007
Springer

Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance

13 years 10 months ago
Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance
Clustering short length texts is a difficult task itself, but adding the narrow domain characteristic poses an additional challenge for current clustering methods. We addressed this problem with the use of a new measure of distance between documents which is based on the symmetric Kullback-Leibler distance. Although this measure is commonly used to calculate a distance between two probability distributions, we have adapted it in order to obtain a distance value between two documents. We have carried out experiments over two different narrowdomain corpora and our findings indicates that it is possible to use this measure for the addressed problem obtaining comparable results than those which use the Jaccard similarity measure.
David Pinto, José-Miguel Benedí, Pao
Added 07 Jun 2010
Updated 07 Jun 2010
Type Conference
Year 2007
Where CICLING
Authors David Pinto, José-Miguel Benedí, Paolo Rosso
Comments (0)