Packing the most onto your cloud

Ashraf Aboulnaga, Ziyu Wang, Zi Ye Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

Parallel dataflow programming frameworks such as Map-Reduce are increasingly being used for large scale data analysis on computing clouds. It is therefore becoming important to automatically optimize the performance of these frameworks. In this paper, we deal with one particular optimization problem, namely scheduling sets of Map-Reduce jobs on a cluster of machines. We present a scheduler that takes job characteristics into account and finds a schedule that minimizes the total completion time of the set of jobs. Our scheduler decides on the number of machines to assign to each job, and it tries to pack as many jobs on the machines as the machine resources can support. To enable flexible assignment of jobs onto machines, we run the Map-Reduce jobs in virtual machines. Our scheduling problem is formulated as a constrained optimization problem, and we experimentally demonstrate using the Hadoop open source Map-Reduce implementation that the solution to this problem results in benefits up to 30%.

Original languageEnglish
Title of host publication1st International Workshop on Cloud Data Management, CloudDB 2009, Co-located with the 18th ACM International Conference on Information and Knowledge Management, CIKM 2009
Pages25-28
Number of pages4
DOIs
Publication statusPublished - 1 Dec 2009
Event1st International Workshop on Cloud Data Management, CloudDB 2009, Co-located with the 18th ACM International Conference on Information and Knowledge Management, CIKM 2009 - Hong Kong, China
Duration: 2 Nov 20096 Nov 2009

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other1st International Workshop on Cloud Data Management, CloudDB 2009, Co-located with the 18th ACM International Conference on Information and Knowledge Management, CIKM 2009
CountryChina
CityHong Kong
Period2/11/096/11/09

Keywords

  • Hadoop
  • Performance modeling
  • Scheduling
  • Virtual machines

ASJC Scopus subject areas

  • Decision Sciences(all)
  • Business, Management and Accounting(all)

Fingerprint Dive into the research topics of 'Packing the most onto your cloud'. Together they form a unique fingerprint.

  • Cite this

    Aboulnaga, A., Wang, Z., & Zhang, Z. Y. (2009). Packing the most onto your cloud. In 1st International Workshop on Cloud Data Management, CloudDB 2009, Co-located with the 18th ACM International Conference on Information and Knowledge Management, CIKM 2009 (pp. 25-28). (International Conference on Information and Knowledge Management, Proceedings). https://doi.org/10.1145/1651263.1651268