PStorM: Profile storage and matching for feedback-based tuning of MapReduce jobs

Mostafa Ead, Ashraf Aboulnaga, Herodotos Herodotou, Shivnath Babu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

The MapReduce programming model has become widely adopted for large scale analytics on big data. MapReduce systems such as Hadoop have many tuning parameters, many of which have a significant impact on performance. The map and reduce functions that make up a MapReduce job are developed using arbitrary programming constructs, which make them black-box in nature and therefore renders it difficult for users and administrators to make good parameter tuning decisions for a submitted MapReduce job. An approach that is gaining popularity is to provide automatic tuning decisions for submitted MapReduce jobs based on feedback from previously executed jobs. This approach is adopted, for example, by the Starfish system. Starfish and similar systems base their tuning decisions on an execution profile of the MapReduce job being tuned. This execution profile contains summary information about the runtime behavior of the job being tuned, and it is assumed to come from a previous execution of the same job. Managing these execution profiles has not been previously studied. This paper presents PStorM, a profile store and matcher that accurately chooses the relevant profiling information for tuning a submitted MapReduce job from the previously collected profiling information. PStorM can identify accurate tuning profiles even for previously unseen MapReduce jobs. PStorM is currently integrated with the Starfish system, although it can be extended to work with any MapReduce tuning system. Experiments on a large number of MapReduce jobs demonstrate the accuracy and efficiency of profile matching. The results of these experiments show that the profiles returned by PStorM result in tuning decisions that are as good as decisions based on exact profiles collected during pervious executions of the tuned jobs. This holds even for previously unseen jobs, which significantly reduces the overhead of feedback-driven profile-based MapReduce tuning.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings
PublisherOpenProceedings.org, University of Konstanz, University Library
Pages1-12
Number of pages12
ISBN (Electronic)9783893180653
DOIs
Publication statusPublished - 2014
Event17th International Conference on Extending Database Technology, EDBT 2014 - Athens, Greece
Duration: 24 Mar 201428 Mar 2014

Other

Other17th International Conference on Extending Database Technology, EDBT 2014
CountryGreece
CityAthens
Period24/3/1428/3/14

Fingerprint

Tuning
Feedback
Experiments

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Software

Cite this

Ead, M., Aboulnaga, A., Herodotou, H., & Babu, S. (2014). PStorM: Profile storage and matching for feedback-based tuning of MapReduce jobs. In Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings (pp. 1-12). OpenProceedings.org, University of Konstanz, University Library. https://doi.org/10.5441/002/edbt.2014.02

PStorM : Profile storage and matching for feedback-based tuning of MapReduce jobs. / Ead, Mostafa; Aboulnaga, Ashraf; Herodotou, Herodotos; Babu, Shivnath.

Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings. OpenProceedings.org, University of Konstanz, University Library, 2014. p. 1-12.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ead, M, Aboulnaga, A, Herodotou, H & Babu, S 2014, PStorM: Profile storage and matching for feedback-based tuning of MapReduce jobs. in Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings. OpenProceedings.org, University of Konstanz, University Library, pp. 1-12, 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, 24/3/14. https://doi.org/10.5441/002/edbt.2014.02
Ead M, Aboulnaga A, Herodotou H, Babu S. PStorM: Profile storage and matching for feedback-based tuning of MapReduce jobs. In Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings. OpenProceedings.org, University of Konstanz, University Library. 2014. p. 1-12 https://doi.org/10.5441/002/edbt.2014.02
Ead, Mostafa ; Aboulnaga, Ashraf ; Herodotou, Herodotos ; Babu, Shivnath. / PStorM : Profile storage and matching for feedback-based tuning of MapReduce jobs. Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings. OpenProceedings.org, University of Konstanz, University Library, 2014. pp. 1-12
@inproceedings{ce15894d5c004b9c9aecbfe5d38b5649,
title = "PStorM: Profile storage and matching for feedback-based tuning of MapReduce jobs",
abstract = "The MapReduce programming model has become widely adopted for large scale analytics on big data. MapReduce systems such as Hadoop have many tuning parameters, many of which have a significant impact on performance. The map and reduce functions that make up a MapReduce job are developed using arbitrary programming constructs, which make them black-box in nature and therefore renders it difficult for users and administrators to make good parameter tuning decisions for a submitted MapReduce job. An approach that is gaining popularity is to provide automatic tuning decisions for submitted MapReduce jobs based on feedback from previously executed jobs. This approach is adopted, for example, by the Starfish system. Starfish and similar systems base their tuning decisions on an execution profile of the MapReduce job being tuned. This execution profile contains summary information about the runtime behavior of the job being tuned, and it is assumed to come from a previous execution of the same job. Managing these execution profiles has not been previously studied. This paper presents PStorM, a profile store and matcher that accurately chooses the relevant profiling information for tuning a submitted MapReduce job from the previously collected profiling information. PStorM can identify accurate tuning profiles even for previously unseen MapReduce jobs. PStorM is currently integrated with the Starfish system, although it can be extended to work with any MapReduce tuning system. Experiments on a large number of MapReduce jobs demonstrate the accuracy and efficiency of profile matching. The results of these experiments show that the profiles returned by PStorM result in tuning decisions that are as good as decisions based on exact profiles collected during pervious executions of the tuned jobs. This holds even for previously unseen jobs, which significantly reduces the overhead of feedback-driven profile-based MapReduce tuning.",
author = "Mostafa Ead and Ashraf Aboulnaga and Herodotos Herodotou and Shivnath Babu",
year = "2014",
doi = "10.5441/002/edbt.2014.02",
language = "English",
pages = "1--12",
booktitle = "Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings",
publisher = "OpenProceedings.org, University of Konstanz, University Library",

}

TY - GEN

T1 - PStorM

T2 - Profile storage and matching for feedback-based tuning of MapReduce jobs

AU - Ead, Mostafa

AU - Aboulnaga, Ashraf

AU - Herodotou, Herodotos

AU - Babu, Shivnath

PY - 2014

Y1 - 2014

N2 - The MapReduce programming model has become widely adopted for large scale analytics on big data. MapReduce systems such as Hadoop have many tuning parameters, many of which have a significant impact on performance. The map and reduce functions that make up a MapReduce job are developed using arbitrary programming constructs, which make them black-box in nature and therefore renders it difficult for users and administrators to make good parameter tuning decisions for a submitted MapReduce job. An approach that is gaining popularity is to provide automatic tuning decisions for submitted MapReduce jobs based on feedback from previously executed jobs. This approach is adopted, for example, by the Starfish system. Starfish and similar systems base their tuning decisions on an execution profile of the MapReduce job being tuned. This execution profile contains summary information about the runtime behavior of the job being tuned, and it is assumed to come from a previous execution of the same job. Managing these execution profiles has not been previously studied. This paper presents PStorM, a profile store and matcher that accurately chooses the relevant profiling information for tuning a submitted MapReduce job from the previously collected profiling information. PStorM can identify accurate tuning profiles even for previously unseen MapReduce jobs. PStorM is currently integrated with the Starfish system, although it can be extended to work with any MapReduce tuning system. Experiments on a large number of MapReduce jobs demonstrate the accuracy and efficiency of profile matching. The results of these experiments show that the profiles returned by PStorM result in tuning decisions that are as good as decisions based on exact profiles collected during pervious executions of the tuned jobs. This holds even for previously unseen jobs, which significantly reduces the overhead of feedback-driven profile-based MapReduce tuning.

AB - The MapReduce programming model has become widely adopted for large scale analytics on big data. MapReduce systems such as Hadoop have many tuning parameters, many of which have a significant impact on performance. The map and reduce functions that make up a MapReduce job are developed using arbitrary programming constructs, which make them black-box in nature and therefore renders it difficult for users and administrators to make good parameter tuning decisions for a submitted MapReduce job. An approach that is gaining popularity is to provide automatic tuning decisions for submitted MapReduce jobs based on feedback from previously executed jobs. This approach is adopted, for example, by the Starfish system. Starfish and similar systems base their tuning decisions on an execution profile of the MapReduce job being tuned. This execution profile contains summary information about the runtime behavior of the job being tuned, and it is assumed to come from a previous execution of the same job. Managing these execution profiles has not been previously studied. This paper presents PStorM, a profile store and matcher that accurately chooses the relevant profiling information for tuning a submitted MapReduce job from the previously collected profiling information. PStorM can identify accurate tuning profiles even for previously unseen MapReduce jobs. PStorM is currently integrated with the Starfish system, although it can be extended to work with any MapReduce tuning system. Experiments on a large number of MapReduce jobs demonstrate the accuracy and efficiency of profile matching. The results of these experiments show that the profiles returned by PStorM result in tuning decisions that are as good as decisions based on exact profiles collected during pervious executions of the tuned jobs. This holds even for previously unseen jobs, which significantly reduces the overhead of feedback-driven profile-based MapReduce tuning.

UR - http://www.scopus.com/inward/record.url?scp=84978739899&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84978739899&partnerID=8YFLogxK

U2 - 10.5441/002/edbt.2014.02

DO - 10.5441/002/edbt.2014.02

M3 - Conference contribution

AN - SCOPUS:84978739899

SP - 1

EP - 12

BT - Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings

PB - OpenProceedings.org, University of Konstanz, University Library

ER -