We propose a technique for measuring the structural similarity of semistructured documents based on entropy. After extracting the structural information from two documents we use ...
Clustering is a well known technique that allows scalability and fault tolerance in distributed systems. In the J2EE framework, clustering can be used to improve the performance a...
Left unchecked, the fundamental drive to increase peak performance using tens of thousands of power hungry components will lead to intolerable operating costs and failure rates. H...
Most traditional text clustering methods are based on "bag of words" (BOW) representation based on frequency statistics in a set of documents. BOW, however, ignores the ...
Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua L...
In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering&...
Inderjit S. Dhillon, Subramanyam Mallela, Rahul Ku...