Efficient big data processing in Hadoop MapReduce

Research output: Chapter in Book/Report/Conference proceedingChapter

122 Citations (Scopus)

Abstract

This tutorial is motivated by the clear need of many organizations, companies, and researchers to deal with big data volumes efficiently. Examples include web analytics applications, scientific applications, and social networks. A popular data processing engine for big data is Hadoop MapReduce. Early versions of Hadoop MapReduce suffered from severe performance problems. Today, this is becoming history. There are many techniques that can be used with Hadoop MapReduce jobs to boost performance by orders of magnitude. In this tutorial we teach such techniques. First, we will briefly familiarize the audience with Hadoop MapReduce and motivate its use for big data processing. Then, we will focus on different data management techniques, going from job optimization to physical data organization like data layouts and indexes. Throughout this tutorial, we will highlight the similarities and differences between Hadoop MapReduce and Parallel DBMS. Furthermore, we will point out unresolved research problems and open issues.

Original languageEnglish
Title of host publicationProceedings of the VLDB Endowment
Pages2014-2015
Number of pages2
Volume5
Edition12
Publication statusPublished - Aug 2012
Externally publishedYes

Fingerprint

Information management
Engines
Big data
Industry

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Dittrich, J., & Quiane Ruiz, J. A. (2012). Efficient big data processing in Hadoop MapReduce. In Proceedings of the VLDB Endowment (12 ed., Vol. 5, pp. 2014-2015)

Efficient big data processing in Hadoop MapReduce. / Dittrich, Jens; Quiane Ruiz, Jorge Arnulfo.

Proceedings of the VLDB Endowment. Vol. 5 12. ed. 2012. p. 2014-2015.

Research output: Chapter in Book/Report/Conference proceedingChapter

Dittrich, J & Quiane Ruiz, JA 2012, Efficient big data processing in Hadoop MapReduce. in Proceedings of the VLDB Endowment. 12 edn, vol. 5, pp. 2014-2015.
Dittrich J, Quiane Ruiz JA. Efficient big data processing in Hadoop MapReduce. In Proceedings of the VLDB Endowment. 12 ed. Vol. 5. 2012. p. 2014-2015
Dittrich, Jens ; Quiane Ruiz, Jorge Arnulfo. / Efficient big data processing in Hadoop MapReduce. Proceedings of the VLDB Endowment. Vol. 5 12. ed. 2012. pp. 2014-2015
@inbook{8c3c767a5a4241a0973eeb7e13063715,
title = "Efficient big data processing in Hadoop MapReduce",
abstract = "This tutorial is motivated by the clear need of many organizations, companies, and researchers to deal with big data volumes efficiently. Examples include web analytics applications, scientific applications, and social networks. A popular data processing engine for big data is Hadoop MapReduce. Early versions of Hadoop MapReduce suffered from severe performance problems. Today, this is becoming history. There are many techniques that can be used with Hadoop MapReduce jobs to boost performance by orders of magnitude. In this tutorial we teach such techniques. First, we will briefly familiarize the audience with Hadoop MapReduce and motivate its use for big data processing. Then, we will focus on different data management techniques, going from job optimization to physical data organization like data layouts and indexes. Throughout this tutorial, we will highlight the similarities and differences between Hadoop MapReduce and Parallel DBMS. Furthermore, we will point out unresolved research problems and open issues.",
author = "Jens Dittrich and {Quiane Ruiz}, {Jorge Arnulfo}",
year = "2012",
month = "8",
language = "English",
volume = "5",
pages = "2014--2015",
booktitle = "Proceedings of the VLDB Endowment",
edition = "12",

}

TY - CHAP

T1 - Efficient big data processing in Hadoop MapReduce

AU - Dittrich, Jens

AU - Quiane Ruiz, Jorge Arnulfo

PY - 2012/8

Y1 - 2012/8

N2 - This tutorial is motivated by the clear need of many organizations, companies, and researchers to deal with big data volumes efficiently. Examples include web analytics applications, scientific applications, and social networks. A popular data processing engine for big data is Hadoop MapReduce. Early versions of Hadoop MapReduce suffered from severe performance problems. Today, this is becoming history. There are many techniques that can be used with Hadoop MapReduce jobs to boost performance by orders of magnitude. In this tutorial we teach such techniques. First, we will briefly familiarize the audience with Hadoop MapReduce and motivate its use for big data processing. Then, we will focus on different data management techniques, going from job optimization to physical data organization like data layouts and indexes. Throughout this tutorial, we will highlight the similarities and differences between Hadoop MapReduce and Parallel DBMS. Furthermore, we will point out unresolved research problems and open issues.

AB - This tutorial is motivated by the clear need of many organizations, companies, and researchers to deal with big data volumes efficiently. Examples include web analytics applications, scientific applications, and social networks. A popular data processing engine for big data is Hadoop MapReduce. Early versions of Hadoop MapReduce suffered from severe performance problems. Today, this is becoming history. There are many techniques that can be used with Hadoop MapReduce jobs to boost performance by orders of magnitude. In this tutorial we teach such techniques. First, we will briefly familiarize the audience with Hadoop MapReduce and motivate its use for big data processing. Then, we will focus on different data management techniques, going from job optimization to physical data organization like data layouts and indexes. Throughout this tutorial, we will highlight the similarities and differences between Hadoop MapReduce and Parallel DBMS. Furthermore, we will point out unresolved research problems and open issues.

UR - http://www.scopus.com/inward/record.url?scp=84873140259&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84873140259&partnerID=8YFLogxK

M3 - Chapter

VL - 5

SP - 2014

EP - 2015

BT - Proceedings of the VLDB Endowment

ER -