Sampling with incremental mapreduce

3 years 10 months ago
Sampling with incremental mapreduce
: The goal of this paper is to increase the computation speed of MapReduce jobs by reducing the accuracy of the result. Often, the timely processing is more important than the precision of the result. Hadoop has no built-in functionality for such an approximation technique, so the user has to implement sampling techniques manually. We introduce an automatic system for computing arithmetic approximations. The sampling is based on techniques from statistics and the extrapolation is done generically. This system is also extended by an incremental component which enables the reuse of already computed results to enlarge the sampling size. This can be used iteratively to further increase the sampling size and also the precision of the approximation. We present a transparent incremental sampling approach, so the developed components can be integrated in the Hadoop framework in a non-invasive manner.
Marc Schäfer, Johannes Schildgen, Stefan De&s
Added 17 Apr 2016
Updated 17 Apr 2016
Type Journal
Year 2015
Where BTW
Authors Marc Schäfer, Johannes Schildgen, Stefan Deßloch
Comments (0)