The performance of document clustering systems depends on employing optimal text representations, which are not only difficult to determine beforehand, but also may vary from one ...
Document clustering has been used for better document retrieval, document browsing, and text mining in digital library. In this paper, we perform a comprehensive comparison study ...
More and more documents on the World Wide Web are based on templates. On a technical level this causes those documents to have a quite similar source code and DOM tree structure. G...
We consider the problem of clustering data lying on multiple subspaces of unknown and possibly different dimensions. We show that one can represent the subspaces with a set of pol...
Coreferencing entities across documents in a large corpus enables advanced document understanding tasks such as question answering. This paper presents a novel cross document core...
Jian Huang 0002, Sarah M. Taylor, Jonathan L. Smit...