Text Mining with an Augmented Version of the Bisecting K-Means Algorithm

15 years 1 months ago

Download muscat.l.chiba-u.ac.jp

There is an ever increasing number of electronic documents available today and the task of organizing and categorizing this ever growing corpus of electronic documents has become too large to perform by analog means. In this paper, we have proposed an augmented version of the bisecting k-means clustering algorithm for automated text categorization tasks. In our augmented version, we have added (1) a bootstrap aggregating procedure, (2) a bisecting criteria that relies on dispersions of data within clusters, and (3) a method to automatically terminate the algorithm when an optimal number of clusters have been produced. We have performed text categorization experiments in order to compare our algorithm against the standard bisecting k-means and k-means algorithms. The results showed that our augmented version improved approximately 15% and 20% in classification accuracies compared to the standard bisecting k-means and k-means, respectively.

Yutaro Hatagami, Toshihiko Matsuka

Real-time Traffic

Algorithms | ICONIP 2009 | Information Technology | K-means | Standard Bisecting K-means |

claim paper

Post Info
More Details (n/a)

Added	19 Feb 2011
Updated	19 Feb 2011
Type	Journal
Year	2009
Where	ICONIP
Authors	Yutaro Hatagami, Toshihiko Matsuka

Comments (0)

Sciweavers

Text Mining with an Augmented Version of the Bisecting K-Means Algorithm

Algorithms | ICONIP 2009 | Information Technology | K-means | Standard Bisecting K-means |

Explore & Download

Productivity Tools

Sciweavers