Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help?

15 years 11 months ago

Download nuage.cs.washington.edu

Abstract— As the datasets used to fuel modern scientiﬁc discovery grow increasingly large, they become increasingly difﬁcult to manage using conventional software. Parallel database management systems (DBMSs) and massive-scale data processing systems such as MapReduce hold promise to address this challenge. However, since these systems have not been expressly designed for scientiﬁc applications, their efﬁcacy in this domain has not been thoroughly tested. In this paper, we study the performance of these engines in one speciﬁc domain: massive astrophysical simulations. We develop a use case that comprises ﬁve representative queries. We implement this use case in one distributed DBMS and in the Pig/Hadoop system. We compare the performance of the tools to each other and to hand-written IDL scripts. We ﬁnd that certain representative analyses are easy to express in each engine’s highlevel language and both systems provide competitive performance and improved scalability ...

Sarah Loebman, Dylan Nunley, YongChul Kwon, Bill H

Real-time Traffic

CLUSTER 2009 | Cluster Computing | MapReduce Hold Promise | Massive Astrophysical Simulations | Modern Scientiﬁc Discovery |

claim paper

Post Info
More Details (n/a)

Added	20 May 2010
Updated	20 May 2010
Type	Conference
Year	2009
Where	CLUSTER
Authors	Sarah Loebman, Dylan Nunley, YongChul Kwon, Bill Howe, Magdalena Balazinska, Jeffrey P. Gardner

Comments (0)

Sciweavers

Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help?

CLUSTER 2009 | Cluster Computing | MapReduce Hold Promise | Massive Astrophysical Simulations | Modern Scientiﬁc Discovery |

Explore & Download

Productivity Tools

Sciweavers