Sciweavers

SDM
2010
SIAM

Radius Plots for Mining Tera-byte Scale Graphs: Algorithms, Patterns, and Observations

13 years 5 months ago
Radius Plots for Mining Tera-byte Scale Graphs: Algorithms, Patterns, and Observations
Given large, multi-million node graphs (e.g., FaceBook, web-crawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers of the graphs? We show that the Radius Plot (pdf of node radii) can answer these questions. However, computing the Radius Plot is prohibitively expensive for graphs reaching the planetary scale. There are two major contributions in this paper: (a) We propose HADI (HAdoop DIameter and radii estimator), a carefully designed and fine-tuned algorithm to compute the diameter of massive graphs, that runs on the top of the HADOOP /MAPREDUCE system, with excellent scale-up on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/8 of a Terabyte), one of the largest public graphs ever analyzed. Thanks to HADI, we report fascinating patterns on large networks, like the surprisingly small effective diameter, the multi-modal/bi-modal shape of the Radius Plot, and its pa...
U. Kang, Charalampos E. Tsourakakis, Ana Paula App
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where SDM
Authors U. Kang, Charalampos E. Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec
Comments (0)