Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing)

Jens Dittrich, Jorge Arnulfo Quiane Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, Jörg Schad

Research output: Chapter in Book/Report/Conference proceedingChapter

297 Citations (Scopus)

Abstract

MapReduce is a computing paradigm that has gained a lot of attention in recent years from industry and research. Unlike parallel DBMSs, MapReduce allows non-expert users to run complex analytical tasks over very large data sets on very large clusters and clouds. However, this comes at a price: MapReduce processes tasks in a scan-oriented fashion. Hence, the performance of Hadoop-an open-source implementation of MapReduce-often does not match the one of a well-configured parallel DBMS. In this paper we propose a new type of system named Hadoop++: it boosts task performance without changing the Hadoop framework at all (Hadoop does not even 'notice it'). To reach this goal, rather than changing a working system (Hadoop), we inject our technology at the right places through UDFs only and affect Hadoop from inside. This has three important consequences: First, Hadoop++ significantly outperforms Hadoop. Second, any future changes of Hadoop may directly be used with Hadoop++ without rewriting any glue code. Third, Hadoop++ does not need to change the Hadoop interface. Our experiments show the superiority of Hadoop++ over both Hadoop and HadoopDB for tasks related to indexing and join processing.

Original languageEnglish
Title of host publicationProceedings of the VLDB Endowment
Pages518-529
Number of pages12
Volume3
Edition1
Publication statusPublished - Sep 2010
Externally publishedYes

    Fingerprint

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Dittrich, J., Quiane Ruiz, J. A., Jindal, A., Kargin, Y., Setty, V., & Schad, J. (2010). Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). In Proceedings of the VLDB Endowment (1 ed., Vol. 3, pp. 518-529)