Sciweavers

SSDBM
2010
IEEE

Scalable Clustering Algorithm for N-Body Simulations in a Shared-Nothing Cluster

13 years 9 months ago
Scalable Clustering Algorithm for N-Body Simulations in a Shared-Nothing Cluster
Abstract. Scientists’ ability to generate and collect massive-scale datasets is increasing. As a result, constraints in data analysis capability rather than limitations in the availability of data have become the bottleneck to scientific discovery. MapReduce-style platforms hold the promise to address this growing data analysis problem, but it is not easy to express many scientific analyses in these new frameworks. In this paper, we study data analysis challenges found in the astronomy simulation domain. In particular, we present a scalable, parallel algorithm for data clustering in this domain. Our algorithm makes two contributions. First, it shows how a clustering problem can be efficiently implemented in a MapReduce-style framework. Second, it includes optimizations that enable scalability, even in the presence of skew. We implement our solution in the Dryad parallel data processing system using DryadLINQ. We evaluate its performance and scalability using a real dataset compris...
YongChul Kwon, Dylan Nunley, Jeffrey P. Gardner, M
Added 10 Jul 2010
Updated 10 Jul 2010
Type Conference
Year 2010
Where SSDBM
Authors YongChul Kwon, Dylan Nunley, Jeffrey P. Gardner, Magdalena Balazinska, Bill Howe, Sarah Loebman
Comments (0)