Only aggressive elephants are fast elephants

Jens Dittrich, Jorge Arnulfo Quiane Ruiz, Stefan Richter, Stefan Schuh, Alekh Jindal, J̈org Schad

Research output: Chapter in Book/Report/Conference proceedingChapter

62 Citations (Scopus)

Abstract

Yellow elephants are slow. A major reason is that they consume their inputs entirely before responding to an elephant rider's orders. Some clever riders have trained their yellow elephants to only consume parts of the inputs before responding. However, the teaching time to make an elephant do that is high. So high that the teaching lessons often do not pay off. We take a different approach. We make elephants aggressive; only this will make them very fast. We propose HAIL (Hadoop Aggressive Indexing Library), an enhancement of HDFS and Hadoop MapReduce that dramatically improves runtimes of several classes of MapReduce jobs. HAIL changes the upload pipeline of HDFS in order to create different clustered indexes on each data block replica. An interesting feature of HAIL is that we typically create a win-win situation: we improve both data upload to HDFS and the runtime of the actual Hadoop MapReduce job. In terms of data upload, HAIL improves over HDFS by up to 60% with the default replication factor of three. In terms of query execution, we demonstrate that HAIL runs up to 68x faster than Hadoop. In our experiments, we use six clusters including physical and EC2 clusters of up to 100 nodes. A series of scalability experiments also demonstrates the superiority of HAIL.

Original languageEnglish
Title of host publicationProceedings of the VLDB Endowment
Pages1591-1602
Number of pages12
Volume5
Edition11
Publication statusPublished - Jul 2012
Externally publishedYes

Fingerprint

Teaching
Scalability
Pipelines
Experiments

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Dittrich, J., Quiane Ruiz, J. A., Richter, S., Schuh, S., Jindal, A., & Schad, J. (2012). Only aggressive elephants are fast elephants. In Proceedings of the VLDB Endowment (11 ed., Vol. 5, pp. 1591-1602)

Only aggressive elephants are fast elephants. / Dittrich, Jens; Quiane Ruiz, Jorge Arnulfo; Richter, Stefan; Schuh, Stefan; Jindal, Alekh; Schad, J̈org.

Proceedings of the VLDB Endowment. Vol. 5 11. ed. 2012. p. 1591-1602.

Research output: Chapter in Book/Report/Conference proceedingChapter

Dittrich, J, Quiane Ruiz, JA, Richter, S, Schuh, S, Jindal, A & Schad, J 2012, Only aggressive elephants are fast elephants. in Proceedings of the VLDB Endowment. 11 edn, vol. 5, pp. 1591-1602.
Dittrich J, Quiane Ruiz JA, Richter S, Schuh S, Jindal A, Schad J. Only aggressive elephants are fast elephants. In Proceedings of the VLDB Endowment. 11 ed. Vol. 5. 2012. p. 1591-1602
Dittrich, Jens ; Quiane Ruiz, Jorge Arnulfo ; Richter, Stefan ; Schuh, Stefan ; Jindal, Alekh ; Schad, J̈org. / Only aggressive elephants are fast elephants. Proceedings of the VLDB Endowment. Vol. 5 11. ed. 2012. pp. 1591-1602
@inbook{ef3ddeb6cbe74885ace8c94bce091dc4,
title = "Only aggressive elephants are fast elephants",
abstract = "Yellow elephants are slow. A major reason is that they consume their inputs entirely before responding to an elephant rider's orders. Some clever riders have trained their yellow elephants to only consume parts of the inputs before responding. However, the teaching time to make an elephant do that is high. So high that the teaching lessons often do not pay off. We take a different approach. We make elephants aggressive; only this will make them very fast. We propose HAIL (Hadoop Aggressive Indexing Library), an enhancement of HDFS and Hadoop MapReduce that dramatically improves runtimes of several classes of MapReduce jobs. HAIL changes the upload pipeline of HDFS in order to create different clustered indexes on each data block replica. An interesting feature of HAIL is that we typically create a win-win situation: we improve both data upload to HDFS and the runtime of the actual Hadoop MapReduce job. In terms of data upload, HAIL improves over HDFS by up to 60{\%} with the default replication factor of three. In terms of query execution, we demonstrate that HAIL runs up to 68x faster than Hadoop. In our experiments, we use six clusters including physical and EC2 clusters of up to 100 nodes. A series of scalability experiments also demonstrates the superiority of HAIL.",
author = "Jens Dittrich and {Quiane Ruiz}, {Jorge Arnulfo} and Stefan Richter and Stefan Schuh and Alekh Jindal and J̈org Schad",
year = "2012",
month = "7",
language = "English",
volume = "5",
pages = "1591--1602",
booktitle = "Proceedings of the VLDB Endowment",
edition = "11",

}

TY - CHAP

T1 - Only aggressive elephants are fast elephants

AU - Dittrich, Jens

AU - Quiane Ruiz, Jorge Arnulfo

AU - Richter, Stefan

AU - Schuh, Stefan

AU - Jindal, Alekh

AU - Schad, J̈org

PY - 2012/7

Y1 - 2012/7

N2 - Yellow elephants are slow. A major reason is that they consume their inputs entirely before responding to an elephant rider's orders. Some clever riders have trained their yellow elephants to only consume parts of the inputs before responding. However, the teaching time to make an elephant do that is high. So high that the teaching lessons often do not pay off. We take a different approach. We make elephants aggressive; only this will make them very fast. We propose HAIL (Hadoop Aggressive Indexing Library), an enhancement of HDFS and Hadoop MapReduce that dramatically improves runtimes of several classes of MapReduce jobs. HAIL changes the upload pipeline of HDFS in order to create different clustered indexes on each data block replica. An interesting feature of HAIL is that we typically create a win-win situation: we improve both data upload to HDFS and the runtime of the actual Hadoop MapReduce job. In terms of data upload, HAIL improves over HDFS by up to 60% with the default replication factor of three. In terms of query execution, we demonstrate that HAIL runs up to 68x faster than Hadoop. In our experiments, we use six clusters including physical and EC2 clusters of up to 100 nodes. A series of scalability experiments also demonstrates the superiority of HAIL.

AB - Yellow elephants are slow. A major reason is that they consume their inputs entirely before responding to an elephant rider's orders. Some clever riders have trained their yellow elephants to only consume parts of the inputs before responding. However, the teaching time to make an elephant do that is high. So high that the teaching lessons often do not pay off. We take a different approach. We make elephants aggressive; only this will make them very fast. We propose HAIL (Hadoop Aggressive Indexing Library), an enhancement of HDFS and Hadoop MapReduce that dramatically improves runtimes of several classes of MapReduce jobs. HAIL changes the upload pipeline of HDFS in order to create different clustered indexes on each data block replica. An interesting feature of HAIL is that we typically create a win-win situation: we improve both data upload to HDFS and the runtime of the actual Hadoop MapReduce job. In terms of data upload, HAIL improves over HDFS by up to 60% with the default replication factor of three. In terms of query execution, we demonstrate that HAIL runs up to 68x faster than Hadoop. In our experiments, we use six clusters including physical and EC2 clusters of up to 100 nodes. A series of scalability experiments also demonstrates the superiority of HAIL.

UR - http://www.scopus.com/inward/record.url?scp=84873131743&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84873131743&partnerID=8YFLogxK

M3 - Chapter

VL - 5

SP - 1591

EP - 1602

BT - Proceedings of the VLDB Endowment

ER -