Sciweavers

CLUSTER
2009
IEEE

Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help?

13 years 11 months ago
Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help?
Abstract— As the datasets used to fuel modern scientific discovery grow increasingly large, they become increasingly difficult to manage using conventional software. Parallel database management systems (DBMSs) and massive-scale data processing systems such as MapReduce hold promise to address this challenge. However, since these systems have not been expressly designed for scientific applications, their efficacy in this domain has not been thoroughly tested. In this paper, we study the performance of these engines in one specific domain: massive astrophysical simulations. We develop a use case that comprises five representative queries. We implement this use case in one distributed DBMS and in the Pig/Hadoop system. We compare the performance of the tools to each other and to hand-written IDL scripts. We find that certain representative analyses are easy to express in each engine’s highlevel language and both systems provide competitive performance and improved scalability ...
Sarah Loebman, Dylan Nunley, YongChul Kwon, Bill H
Added 20 May 2010
Updated 20 May 2010
Type Conference
Year 2009
Where CLUSTER
Authors Sarah Loebman, Dylan Nunley, YongChul Kwon, Bill Howe, Magdalena Balazinska, Jeffrey P. Gardner
Comments (0)