Packing the most onto your cloud

Ashraf Aboulnaga, Ziyu Wang, Zi Ye Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

Parallel dataflow programming frameworks such as Map-Reduce are increasingly being used for large scale data analysis on computing clouds. It is therefore becoming important to automatically optimize the performance of these frameworks. In this paper, we deal with one particular optimization problem, namely scheduling sets of Map-Reduce jobs on a cluster of machines. We present a scheduler that takes job characteristics into account and finds a schedule that minimizes the total completion time of the set of jobs. Our scheduler decides on the number of machines to assign to each job, and it tries to pack as many jobs on the machines as the machine resources can support. To enable flexible assignment of jobs onto machines, we run the Map-Reduce jobs in virtual machines. Our scheduling problem is formulated as a constrained optimization problem, and we experimentally demonstrate using the Hadoop open source Map-Reduce implementation that the solution to this problem results in benefits up to 30%.

Original languageEnglish
Title of host publicationInternational Conference on Information and Knowledge Management, Proceedings
Pages25-28
Number of pages4
DOIs
Publication statusPublished - 1 Dec 2009
Externally publishedYes
Event1st International Workshop on Cloud Data Management, CloudDB 2009, Co-located with the 18th ACM International Conference on Information and Knowledge Management, CIKM 2009 - Hong Kong, China
Duration: 2 Nov 20096 Nov 2009

Other

Other1st International Workshop on Cloud Data Management, CloudDB 2009, Co-located with the 18th ACM International Conference on Information and Knowledge Management, CIKM 2009
CountryChina
CityHong Kong
Period2/11/096/11/09

Fingerprint

MapReduce
Optimization problem
Total completion time
Resources
Schedule
Cloud computing
Assignment
Programming
Open source
Job characteristics
Constrained optimization
Hadoop
Data flow

Keywords

  • Hadoop
  • Performance modeling
  • Scheduling
  • Virtual machines

ASJC Scopus subject areas

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Cite this

Aboulnaga, A., Wang, Z., & Zhang, Z. Y. (2009). Packing the most onto your cloud. In International Conference on Information and Knowledge Management, Proceedings (pp. 25-28) https://doi.org/10.1145/1651263.1651268

Packing the most onto your cloud. / Aboulnaga, Ashraf; Wang, Ziyu; Zhang, Zi Ye.

International Conference on Information and Knowledge Management, Proceedings. 2009. p. 25-28.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Aboulnaga, A, Wang, Z & Zhang, ZY 2009, Packing the most onto your cloud. in International Conference on Information and Knowledge Management, Proceedings. pp. 25-28, 1st International Workshop on Cloud Data Management, CloudDB 2009, Co-located with the 18th ACM International Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, 2/11/09. https://doi.org/10.1145/1651263.1651268
Aboulnaga A, Wang Z, Zhang ZY. Packing the most onto your cloud. In International Conference on Information and Knowledge Management, Proceedings. 2009. p. 25-28 https://doi.org/10.1145/1651263.1651268
Aboulnaga, Ashraf ; Wang, Ziyu ; Zhang, Zi Ye. / Packing the most onto your cloud. International Conference on Information and Knowledge Management, Proceedings. 2009. pp. 25-28
@inproceedings{3ce9871dcb184b33a6cdade58ea55351,
title = "Packing the most onto your cloud",
abstract = "Parallel dataflow programming frameworks such as Map-Reduce are increasingly being used for large scale data analysis on computing clouds. It is therefore becoming important to automatically optimize the performance of these frameworks. In this paper, we deal with one particular optimization problem, namely scheduling sets of Map-Reduce jobs on a cluster of machines. We present a scheduler that takes job characteristics into account and finds a schedule that minimizes the total completion time of the set of jobs. Our scheduler decides on the number of machines to assign to each job, and it tries to pack as many jobs on the machines as the machine resources can support. To enable flexible assignment of jobs onto machines, we run the Map-Reduce jobs in virtual machines. Our scheduling problem is formulated as a constrained optimization problem, and we experimentally demonstrate using the Hadoop open source Map-Reduce implementation that the solution to this problem results in benefits up to 30{\%}.",
keywords = "Hadoop, Performance modeling, Scheduling, Virtual machines",
author = "Ashraf Aboulnaga and Ziyu Wang and Zhang, {Zi Ye}",
year = "2009",
month = "12",
day = "1",
doi = "10.1145/1651263.1651268",
language = "English",
isbn = "9781605588025",
pages = "25--28",
booktitle = "International Conference on Information and Knowledge Management, Proceedings",

}

TY - GEN

T1 - Packing the most onto your cloud

AU - Aboulnaga, Ashraf

AU - Wang, Ziyu

AU - Zhang, Zi Ye

PY - 2009/12/1

Y1 - 2009/12/1

N2 - Parallel dataflow programming frameworks such as Map-Reduce are increasingly being used for large scale data analysis on computing clouds. It is therefore becoming important to automatically optimize the performance of these frameworks. In this paper, we deal with one particular optimization problem, namely scheduling sets of Map-Reduce jobs on a cluster of machines. We present a scheduler that takes job characteristics into account and finds a schedule that minimizes the total completion time of the set of jobs. Our scheduler decides on the number of machines to assign to each job, and it tries to pack as many jobs on the machines as the machine resources can support. To enable flexible assignment of jobs onto machines, we run the Map-Reduce jobs in virtual machines. Our scheduling problem is formulated as a constrained optimization problem, and we experimentally demonstrate using the Hadoop open source Map-Reduce implementation that the solution to this problem results in benefits up to 30%.

AB - Parallel dataflow programming frameworks such as Map-Reduce are increasingly being used for large scale data analysis on computing clouds. It is therefore becoming important to automatically optimize the performance of these frameworks. In this paper, we deal with one particular optimization problem, namely scheduling sets of Map-Reduce jobs on a cluster of machines. We present a scheduler that takes job characteristics into account and finds a schedule that minimizes the total completion time of the set of jobs. Our scheduler decides on the number of machines to assign to each job, and it tries to pack as many jobs on the machines as the machine resources can support. To enable flexible assignment of jobs onto machines, we run the Map-Reduce jobs in virtual machines. Our scheduling problem is formulated as a constrained optimization problem, and we experimentally demonstrate using the Hadoop open source Map-Reduce implementation that the solution to this problem results in benefits up to 30%.

KW - Hadoop

KW - Performance modeling

KW - Scheduling

KW - Virtual machines

UR - http://www.scopus.com/inward/record.url?scp=74049108599&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=74049108599&partnerID=8YFLogxK

U2 - 10.1145/1651263.1651268

DO - 10.1145/1651263.1651268

M3 - Conference contribution

AN - SCOPUS:74049108599

SN - 9781605588025

SP - 25

EP - 28

BT - International Conference on Information and Knowledge Management, Proceedings

ER -