A major challenge in document clustering is the extremely high dimensionality. For example, the vocabulary for a document set can easily be thousands of words. On the other hand, ...
Different from familiar clustering objects, text documents have sparse data spaces. A common way of representing a document is as a bag of its component words, but the semantic re...
In this investigation we propose a novel summarization method of Web pages using hierarchical expression. We discuss close relationship between summarization and hierarchical clust...
Clustering by document concepts is a powerful way of retrieving information from a large number of documents. This task in general does not make any assumption on the data distrib...
In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering&...
Inderjit S. Dhillon, Subramanyam Mallela, Rahul Ku...