Sciweavers

CAINE
2003

A Genetic Algorithm for Clustering on Very Large Data Sets

13 years 5 months ago
A Genetic Algorithm for Clustering on Very Large Data Sets
Clustering is the process of subdividing an input data set into a desired number of subgroups so that members of the same subgroup are similar and members of different subgroups have diverse properties. Many heuristic algorithms have been applied to the clustering problem, which is known to be NP Hard. Genetic algorithms have been used in a wide variety of fields to perform clustering, however, the technique normally has a long running time in terms of input set size. This paper proposes an efficient genetic algorithm for clustering on very large data sets. The genetic algorithm uses the most time efficient traditional techniques along with preprocessing of the input data set. We test our algorithm on both artificial and real data sets, both of which are of large size. The experimental results show that our algorithm outperforms the k-means algorithm in terms of running time as well as the quality of the clustering.
Jim Gasvoda, Qin Ding
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Where CAINE
Authors Jim Gasvoda, Qin Ding
Comments (0)