RAFT at work: Speeding-up mapreduce applications under task and node failures

Jorge Arnulfo Quiane Ruiz, Christoph Pinkel, Jörg Schad, Jens Dittrich

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

The MapReduce framework is typically deployed on very large computing clusters where task and node failures are no longer an exception but the rule. Thus, fault-tolerance is an important aspect for the efficient operation of MapReduce jobs. However, currently MapReduce implementations fully recompute failed tasks (subparts of a job) from the beginning. This can significantly decrease the runtime performance of MapReduce applications. We present an alternative system that implements RAFT ideas. RAFT is a family of powerful and inexpensive Recovery Algorithms for Fast-Tracking MapReduce jobs under task and node failures. To recover from task failures, RAFT exploits the intermediate results persisted by MapReduce at several points in time. RAFT piggybacks checkpoints on the task progress computation. To recover from node failures, RAFT maintains a per-map task list of all input key-value pairs producing intermediate results and pushes intermediate results to reducers. In this demo, we demonstrate that RAFT recovers efficiently from both task and node failures. Further, the audience can compare RAFT with Hadoop via an easy-to-use web interface.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGMOD International Conference on Management of Data
Pages1225-1227
Number of pages3
DOIs
Publication statusPublished - 11 Jul 2011
Externally publishedYes
Event2011 ACM SIGMOD and 30th PODS 2011 Conference - Athens, Greece
Duration: 12 Jun 201116 Jun 2011

Other

Other2011 ACM SIGMOD and 30th PODS 2011 Conference
CountryGreece
CityAthens
Period12/6/1116/6/11

    Fingerprint

Keywords

  • checkpointing
  • fault-tolerance
  • hadoop
  • mapreduce
  • node failures
  • recovery

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Quiane Ruiz, J. A., Pinkel, C., Schad, J., & Dittrich, J. (2011). RAFT at work: Speeding-up mapreduce applications under task and node failures. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1225-1227) https://doi.org/10.1145/1989323.1989460