In this paper, we propose a document clustering method that strives to achieve: (1) a high accuracy of document clustering, and (2) the capability of estimating the number of clus...
We propose a new algorithm for dimensionality reduction and unsupervised text classification. We use mixture models as underlying process of generating corpus and utilize a novel,...
We assess a family of ranking mechanisms for search engines based on linkage analysis using a carefully engineered subset of the World Wide Web, WT10g (Bailey, Craswell and Hawking...
This paper discusses a new type of semi-supervised document clustering that uses partial supervision to partition a large set of documents. Most clustering methods organizes docum...
Figures in digital documents contain important information. Current digital libraries do not summarize and index information available within figures for document retrieval. We pr...
Xiaonan Lu, James Ze Wang, Prasenjit Mitra, C. Lee...