MRShare: Sharing Across Multiple Queries in MapReduce

13 years 5 months ago
MRShare: Sharing Across Multiple Queries in MapReduce
Large-scale data analysis lies in the core of modern enterprises and scientific research. With the emergence of cloud computing, the use of an analytical query processing infrastructure (e.g., Amazon EC2) can be directly mapped to monetary value. MapReduce has been a popular framework in the context of cloud computing, designed to serve long running queries (jobs) which can be processed in batch mode. Taking into account that different jobs often perform similar work, there are many opportunities for sharing. In principle, sharing similar work reduces the overall amount of work, which can lead to reducing monetary charges incurred while utilizing the processing infrastructure. In this paper we propose a sharing framework tailored to MapReduce. Our framework, MRShare, transforms a batch of queries into a new batch that will be executed more efficiently, by merging jobs into groups and evaluating each group as a single query. Based on our cost model for MapReduce, we define an optimi...
Tomasz Nykiel, Michalis Potamias, Chaitanya Mishra
Added 30 Jan 2011
Updated 30 Jan 2011
Type Journal
Year 2010
Authors Tomasz Nykiel, Michalis Potamias, Chaitanya Mishra, George Kollios, Nick Koudas
Comments (0)