Sciweavers

SIGMOD
2012
ACM

Shark: fast data analysis using coarse-grained distributed memory

11 years 7 months ago
Shark: fast data analysis using coarse-grained distributed memory
Shark is a research data analysis system built on a novel rained distributed shared-memory abstraction. Shark marries query processing with deep data analysis, providing a unified system for easy data manipulation using SQL and pushing sophisticated analysis closer to data. It scales to thousands of nodes in a fault-tolerant manner. Shark can answer queries 40X faster than Apache Hive and run machine learning programs 25X faster than MapReduce programs in Apache Hadoop on large datasets. Categories and Subject Descriptors H.2 [Database Management]: Systems General Terms DESIGN, MANAGEMENT Keywords Databases, Data Warehouse, Machine Learning, Resilient Distributed Dataset, Spark, Shark
Cliff Engle, Antonio Lupher, Reynold Xin, Matei Za
Added 27 Sep 2012
Updated 27 Sep 2012
Type Journal
Year 2012
Where SIGMOD
Authors Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael J. Franklin, Scott Shenker, Ion Stoica
Comments (0)