Case study of scientific data processing on a cloud using Hadoop

Chen Zhang, Hans De Sterck, Ashraf Aboulnaga, Haig Djambazian, Rob Sladek

Research output: Chapter in Book/Report/Conference proceedingConference contribution

35 Citations (Scopus)

Abstract

With the increasing popularity of cloud computing, Hadoop has become a widely used open source cloud computing framework for large scale data processing. However, few efforts have been made to demonstrate the applicability of Hadoop to various real-world application scenarios in fields other than server side computations such as web indexing, etc. In this paper, we use the Hadoop cloud computing framework to develop a user application that allows processing of scientific data on clouds. A simple extension to Hadoop's MapReduce is described which allows it to handle scientific data processing problems with arbitrary input formats and explicit control over how the input is split. This approach is used to develop a Hadoop-based cloud computing application that processes sequences of microscope images of live cells, and we test its performance. It is discussed how the approach can be generalized to more complicated scientific data processing problems.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages400-415
Number of pages16
Volume5976 LNCS
DOIs
Publication statusPublished - 21 May 2010
Externally publishedYes
Event23rd International Symposium on High Performance Computing Systems and Applications, HPCS 2009 - Kingston, ON, Canada
Duration: 14 Jun 200917 Jun 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5976 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other23rd International Symposium on High Performance Computing Systems and Applications, HPCS 2009
CountryCanada
CityKingston, ON
Period14/6/0917/6/09

Fingerprint

Cloud computing
Cloud Computing
MapReduce
Performance Test
Real-world Applications
Open Source
Microscope
Indexing
Microscopes
Servers
Server
Scenarios
Cell
Arbitrary
Processing
Demonstrate
Framework

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Zhang, C., De Sterck, H., Aboulnaga, A., Djambazian, H., & Sladek, R. (2010). Case study of scientific data processing on a cloud using Hadoop. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5976 LNCS, pp. 400-415). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5976 LNCS). https://doi.org/10.1007/978-3-642-12659-8_29

Case study of scientific data processing on a cloud using Hadoop. / Zhang, Chen; De Sterck, Hans; Aboulnaga, Ashraf; Djambazian, Haig; Sladek, Rob.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5976 LNCS 2010. p. 400-415 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5976 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, C, De Sterck, H, Aboulnaga, A, Djambazian, H & Sladek, R 2010, Case study of scientific data processing on a cloud using Hadoop. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 5976 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5976 LNCS, pp. 400-415, 23rd International Symposium on High Performance Computing Systems and Applications, HPCS 2009, Kingston, ON, Canada, 14/6/09. https://doi.org/10.1007/978-3-642-12659-8_29
Zhang C, De Sterck H, Aboulnaga A, Djambazian H, Sladek R. Case study of scientific data processing on a cloud using Hadoop. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5976 LNCS. 2010. p. 400-415. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-12659-8_29
Zhang, Chen ; De Sterck, Hans ; Aboulnaga, Ashraf ; Djambazian, Haig ; Sladek, Rob. / Case study of scientific data processing on a cloud using Hadoop. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5976 LNCS 2010. pp. 400-415 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{003912920c44429bb0b2f2908bc8ca3e,
title = "Case study of scientific data processing on a cloud using Hadoop",
abstract = "With the increasing popularity of cloud computing, Hadoop has become a widely used open source cloud computing framework for large scale data processing. However, few efforts have been made to demonstrate the applicability of Hadoop to various real-world application scenarios in fields other than server side computations such as web indexing, etc. In this paper, we use the Hadoop cloud computing framework to develop a user application that allows processing of scientific data on clouds. A simple extension to Hadoop's MapReduce is described which allows it to handle scientific data processing problems with arbitrary input formats and explicit control over how the input is split. This approach is used to develop a Hadoop-based cloud computing application that processes sequences of microscope images of live cells, and we test its performance. It is discussed how the approach can be generalized to more complicated scientific data processing problems.",
author = "Chen Zhang and {De Sterck}, Hans and Ashraf Aboulnaga and Haig Djambazian and Rob Sladek",
year = "2010",
month = "5",
day = "21",
doi = "10.1007/978-3-642-12659-8_29",
language = "English",
isbn = "3642126588",
volume = "5976 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "400--415",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Case study of scientific data processing on a cloud using Hadoop

AU - Zhang, Chen

AU - De Sterck, Hans

AU - Aboulnaga, Ashraf

AU - Djambazian, Haig

AU - Sladek, Rob

PY - 2010/5/21

Y1 - 2010/5/21

N2 - With the increasing popularity of cloud computing, Hadoop has become a widely used open source cloud computing framework for large scale data processing. However, few efforts have been made to demonstrate the applicability of Hadoop to various real-world application scenarios in fields other than server side computations such as web indexing, etc. In this paper, we use the Hadoop cloud computing framework to develop a user application that allows processing of scientific data on clouds. A simple extension to Hadoop's MapReduce is described which allows it to handle scientific data processing problems with arbitrary input formats and explicit control over how the input is split. This approach is used to develop a Hadoop-based cloud computing application that processes sequences of microscope images of live cells, and we test its performance. It is discussed how the approach can be generalized to more complicated scientific data processing problems.

AB - With the increasing popularity of cloud computing, Hadoop has become a widely used open source cloud computing framework for large scale data processing. However, few efforts have been made to demonstrate the applicability of Hadoop to various real-world application scenarios in fields other than server side computations such as web indexing, etc. In this paper, we use the Hadoop cloud computing framework to develop a user application that allows processing of scientific data on clouds. A simple extension to Hadoop's MapReduce is described which allows it to handle scientific data processing problems with arbitrary input formats and explicit control over how the input is split. This approach is used to develop a Hadoop-based cloud computing application that processes sequences of microscope images of live cells, and we test its performance. It is discussed how the approach can be generalized to more complicated scientific data processing problems.

UR - http://www.scopus.com/inward/record.url?scp=77952400308&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952400308&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-12659-8_29

DO - 10.1007/978-3-642-12659-8_29

M3 - Conference contribution

SN - 3642126588

SN - 9783642126581

VL - 5976 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 400

EP - 415

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -