Sciweavers

VLDB
2005
ACM

Discovering Large Dense Subgraphs in Massive Graphs

13 years 10 months ago
Discovering Large Dense Subgraphs in Massive Graphs
We present a new algorithm for finding large, dense subgraphs in massive graphs. Our algorithm is based on a recursive application of fingerprinting via shingles, and is extremely efficient, capable of handling graphs with tens of billions of edges on a single machine with modest resources. We apply our algorithm to characterize the large, dense subgraphs of a graph showing connections between hosts on the World Wide Web; this graph contains over 50M hosts and 11B edges, gathered from 2.1B web pages. We measure the distribution of these dense subgraphs and their evolution over time. We show that more than half of these hosts participate in some dense subgraph found by the analysis. There are several hundred giant dense subgraphs of at least ten thousand hosts; two thousand dense subgraphs at least a thousand hosts; and almost 64K dense subgraphs of at least a hundred hosts. Upon examination, many of the dense subgraphs output by our algorithm are link spam, i.e., websites that attem...
David Gibson, Ravi Kumar, Andrew Tomkins
Added 28 Jun 2010
Updated 28 Jun 2010
Type Conference
Year 2005
Where VLDB
Authors David Gibson, Ravi Kumar, Andrew Tomkins
Comments (0)