RAFTing MapReduce: Fast recovery on the RAFT

8 years 3 months ago
RAFTing MapReduce: Fast recovery on the RAFT
MapReduce is a computing paradigm that has gained a lot of popularity as it allows non-expert users to easily run complex analytical tasks at very large-scale. At such scale, task and node failures are no longer an exception but rather a characteristic of these systems. This makes fault-tolerance a critical issue for the efficient operation of any application. MapReduce automatically reschedules failed tasks to available nodes, which in turn recompute such tasks from scratch. However, this policy can significantly decrease performance of applications. In this paper, we propose a family of Recovery Algorithms for Fast-Tracking (RAFT) MapReduce. As ease-of-use is a major feature of MapReduce, RAFT focuses on simplicity and also non-intrusiveness, in order to be implementation independent. To efficiently recover from task failures, RAFT exploits the fact that MapReduce produces and persists intermediate results at several points in time. RAFT piggy-backs checkpoints on the task progre...
Jorge-Arnulfo Quiané-Ruiz, Christoph Pinkel
Added 21 Aug 2011
Updated 21 Aug 2011
Type Journal
Year 2011
Where ICDE
Authors Jorge-Arnulfo Quiané-Ruiz, Christoph Pinkel, Jörg Schad, Jens Dittrich
Comments (0)