Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To better capture the structure of documents, the unde...
1 The latent semantic indexing (LSI) methodology for information retrieval applies the singular value decomposition to identify an eigensystem for a large matrix, in which cells re...
Document clustering has many uses in natural language tools and applications. For instance, summarizing sets of documents that all describe the same event requires first identifyi...
Disambiguating person names in a set of documents (such as a set of web pages returned in response to a person name) is a key task for the presentation of results and the automatic...
A great challenge for web site designers is how to ensure users' easy access to important web pages efficiently. In this paper we present a clustering-based approach to addres...
Zhong Su, Qiang Yang, HongJiang Zhang, Xiaowei Xu,...