Sciweavers

JCDL
2006
ACM

A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE

13 years 10 months ago
A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE
Document clustering has been used for better document retrieval, document browsing, and text mining in digital library. In this paper, we perform a comprehensive comparison study of various document clustering approaches such as three hierarchical methods (single-link, complete-link, and complete link), Bisecting K-means, K-means, and Suffix Tree Clustering in terms of the efficiency, the effectiveness, and the scalability. In addition, we apply a domain ontology to document clustering to investigate if the ontology such as MeSH improves clustering qualify for MEDLINE articles. Because an ontology is a formal, explicit specification of a shared conceptualization for a domain of interest, the use of ontologies is a natural way to solve traditional information retrieval problems such as synonym/hypernym/ hyponym problems. We conducted fairly extensive experiments based on different evaluation metrics such as misclassification index, F-measure, cluster purity, and Entropy on very large a...
Illhoi Yoo, Xiaohua Hu
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Where JCDL
Authors Illhoi Yoo, Xiaohua Hu
Comments (0)