Frequent term-based text clustering

11 years 4 hour ago
Frequent term-based text clustering
Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known methods of text clustering, however, do not really address the special problems of text clustering: very high dimensionality of the data, very large size of the databases and understandability of the cluster description. In this paper, we introduce a novel approach which uses frequent item (term) sets for text clustering. Such frequent sets can be efficiently discovered using algorithms for association rule mining. To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents. We present two algorithms for frequent term-based text clustering, FTC which creates flat clusterings and HFTC for hierarchical clustering. An experimental evaluation on classical text documents as well as on web documents demonstrates that the proposed algorithms obtain clusterings of comparable quality significantly more efficiently...
Florian Beil, Martin Ester, Xiaowei Xu
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2002
Where KDD
Authors Florian Beil, Martin Ester, Xiaowei Xu
Comments (0)