The application of document clustering to information retrieval has been motivated by the potential effectiveness gains postulated by the Cluster Hypothesis. The hypothesis states ...
Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To better capture the structure of documents, the unde...
In this study we propose sketching algorithms for computing similarities between hierarchical data. Specifically, we look at data objects that are represented using leaf-labeled t...
Classifying an unknown input is a fundamental problem in pattern recognition. A common method is to define a distance metric between patterns and find the most similar pattern i...
Sung-Hyuk Cha, Charles C. Tappert, Sargur N. Sriha...
This paper proposes a clustering approach that explores both the content and the structure of XML documents for determining similarity among them. Assuming that the content and th...