Scalable Discovery of Best Clusters on Large Graphs

12 years 2 months ago
Scalable Discovery of Best Clusters on Large Graphs
The identification of clusters, well-connected components in a graph, is useful in many applications from biological function prediction to social community detection. However, finding these clusters can be difficult as graph sizes increase. Most current graph clustering algorithms scale poorly in terms of time or memory. An important insight is that many clustering applications need only the subset of best clusters, and not all clusters in the entire graph. In this paper we propose a new technique, Top Graph Clusters (TopGC), which probabilistically searches large, edge weighted, directed graphs for their best clusters in linear time. The algorithm is inherently parallelizable, and is able to find variable size, overlapping clusters. To increase scalability, a parameter is introduced that controls memory use. When compared with three other state-of-the art clustering techniques, TopGC achieves running time speedups of up to 70% on large scale real world datasets. In addition, the...
Kathy Macropol, Ambuj K. Singh
Added 30 Jan 2011
Updated 30 Jan 2011
Type Journal
Year 2010
Authors Kathy Macropol, Ambuj K. Singh
Comments (0)