We propose a technique for measuring the structural similarity of semistructured documents based on entropy. After extracting the structural information from two documents we use ...
We present the first constant-factor approximation algorithm for the metric k-median problem. The k-median problem is one of the most well-studied clustering problems, i.e., those...
In this paper, we propose a semi-supervised framework for learning a weighted Euclidean subspace, where the best clustering can be achieved. Our approach capitalizes on user-const...
Maria Halkidi, Dimitrios Gunopulos, Nitin Kumar, M...
We explore in this paper the efficient clustering of item data. Different from those of the traditional data, the features of item data are known to be of high dimensionality and...
Searching for people on the Web is one of the most common query types to the web search engines today. However, when a person name is queried, the returned webpages often contain ...
Dmitri V. Kalashnikov, Rabia Nuray-Turan, Sharad M...