MOON: MapReduce on opportunistic eNvironments

Heshan Lin, Xiaosong Ma, Jeremy Archuleta, Wu Chun Feng, Mark Gardner, Zhe Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

107 Citations (Scopus)

Abstract

MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for distributed volunteer computing systems. However, unlike on dedicated resources, where MapReduce has mostly been deployed, such volunteer computing systems have significantly higher rates of node unavailability. Furthermore, nodes are not fully controlled by the MapReduce framework. Consequently, we found the data and task replication scheme adopted by existing MapReduce implementations woefully inadequate for resources with high unavailability. To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. Our tests on an emulated volunteer computing system, which uses a 60-node cluster where each node possesses a similar hardware configuration to a typical computer in a student lab, demonstrate that MOON can deliver a three-fold performance improvement to Hadoop in volatile, volunteer computing environments.

Original languageEnglish
Title of host publicationHPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Pages95-106
Number of pages12
DOIs
Publication statusPublished - 16 Dec 2010
Externally publishedYes
Event19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010 - Chicago, IL, United States
Duration: 21 Jun 201025 Jun 2010

Other

Other19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010
CountryUnited States
CityChicago, IL
Period21/6/1025/6/10

Fingerprint

Distributed computer systems
Scheduling algorithms
Computer hardware
Students
Processing

Keywords

  • Cloud computing
  • MapReduce
  • Volunteer computing

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Cite this

Lin, H., Ma, X., Archuleta, J., Feng, W. C., Gardner, M., & Zhang, Z. (2010). MOON: MapReduce on opportunistic eNvironments. In HPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (pp. 95-106) https://doi.org/10.1145/1851476.1851489

MOON : MapReduce on opportunistic eNvironments. / Lin, Heshan; Ma, Xiaosong; Archuleta, Jeremy; Feng, Wu Chun; Gardner, Mark; Zhang, Zhe.

HPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 2010. p. 95-106.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lin, H, Ma, X, Archuleta, J, Feng, WC, Gardner, M & Zhang, Z 2010, MOON: MapReduce on opportunistic eNvironments. in HPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. pp. 95-106, 19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010, Chicago, IL, United States, 21/6/10. https://doi.org/10.1145/1851476.1851489
Lin H, Ma X, Archuleta J, Feng WC, Gardner M, Zhang Z. MOON: MapReduce on opportunistic eNvironments. In HPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 2010. p. 95-106 https://doi.org/10.1145/1851476.1851489
Lin, Heshan ; Ma, Xiaosong ; Archuleta, Jeremy ; Feng, Wu Chun ; Gardner, Mark ; Zhang, Zhe. / MOON : MapReduce on opportunistic eNvironments. HPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 2010. pp. 95-106
@inproceedings{8612b13eb40f4dd08fccf5a16701336b,
title = "MOON: MapReduce on opportunistic eNvironments",
abstract = "MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for distributed volunteer computing systems. However, unlike on dedicated resources, where MapReduce has mostly been deployed, such volunteer computing systems have significantly higher rates of node unavailability. Furthermore, nodes are not fully controlled by the MapReduce framework. Consequently, we found the data and task replication scheme adopted by existing MapReduce implementations woefully inadequate for resources with high unavailability. To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. Our tests on an emulated volunteer computing system, which uses a 60-node cluster where each node possesses a similar hardware configuration to a typical computer in a student lab, demonstrate that MOON can deliver a three-fold performance improvement to Hadoop in volatile, volunteer computing environments.",
keywords = "Cloud computing, MapReduce, Volunteer computing",
author = "Heshan Lin and Xiaosong Ma and Jeremy Archuleta and Feng, {Wu Chun} and Mark Gardner and Zhe Zhang",
year = "2010",
month = "12",
day = "16",
doi = "10.1145/1851476.1851489",
language = "English",
isbn = "9781605589428",
pages = "95--106",
booktitle = "HPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing",

}

TY - GEN

T1 - MOON

T2 - MapReduce on opportunistic eNvironments

AU - Lin, Heshan

AU - Ma, Xiaosong

AU - Archuleta, Jeremy

AU - Feng, Wu Chun

AU - Gardner, Mark

AU - Zhang, Zhe

PY - 2010/12/16

Y1 - 2010/12/16

N2 - MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for distributed volunteer computing systems. However, unlike on dedicated resources, where MapReduce has mostly been deployed, such volunteer computing systems have significantly higher rates of node unavailability. Furthermore, nodes are not fully controlled by the MapReduce framework. Consequently, we found the data and task replication scheme adopted by existing MapReduce implementations woefully inadequate for resources with high unavailability. To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. Our tests on an emulated volunteer computing system, which uses a 60-node cluster where each node possesses a similar hardware configuration to a typical computer in a student lab, demonstrate that MOON can deliver a three-fold performance improvement to Hadoop in volatile, volunteer computing environments.

AB - MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for distributed volunteer computing systems. However, unlike on dedicated resources, where MapReduce has mostly been deployed, such volunteer computing systems have significantly higher rates of node unavailability. Furthermore, nodes are not fully controlled by the MapReduce framework. Consequently, we found the data and task replication scheme adopted by existing MapReduce implementations woefully inadequate for resources with high unavailability. To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. Our tests on an emulated volunteer computing system, which uses a 60-node cluster where each node possesses a similar hardware configuration to a typical computer in a student lab, demonstrate that MOON can deliver a three-fold performance improvement to Hadoop in volatile, volunteer computing environments.

KW - Cloud computing

KW - MapReduce

KW - Volunteer computing

UR - http://www.scopus.com/inward/record.url?scp=78650029124&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650029124&partnerID=8YFLogxK

U2 - 10.1145/1851476.1851489

DO - 10.1145/1851476.1851489

M3 - Conference contribution

AN - SCOPUS:78650029124

SN - 9781605589428

SP - 95

EP - 106

BT - HPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

ER -