Case study of scientific data processing on a cloud using Hadoop

Chen Zhang, Hans De Sterck, Ashraf Aboulnaga, Haig Djambazian, Rob Sladek

Research output: Chapter in Book/Report/Conference proceedingConference contribution

37 Citations (Scopus)

Abstract

With the increasing popularity of cloud computing, Hadoop has become a widely used open source cloud computing framework for large scale data processing. However, few efforts have been made to demonstrate the applicability of Hadoop to various real-world application scenarios in fields other than server side computations such as web indexing, etc. In this paper, we use the Hadoop cloud computing framework to develop a user application that allows processing of scientific data on clouds. A simple extension to Hadoop's MapReduce is described which allows it to handle scientific data processing problems with arbitrary input formats and explicit control over how the input is split. This approach is used to develop a Hadoop-based cloud computing application that processes sequences of microscope images of live cells, and we test its performance. It is discussed how the approach can be generalized to more complicated scientific data processing problems.

Original languageEnglish
Title of host publicationHigh Performance Computing Systems and Applications - 23rd International Symposium, HPCS 2009, Revised Selected Papers
Pages400-415
Number of pages16
DOIs
Publication statusPublished - 21 May 2010
Event23rd International Symposium on High Performance Computing Systems and Applications, HPCS 2009 - Kingston, ON, Canada
Duration: 14 Jun 200917 Jun 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5976 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other23rd International Symposium on High Performance Computing Systems and Applications, HPCS 2009
CountryCanada
CityKingston, ON
Period14/6/0917/6/09

    Fingerprint

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Zhang, C., De Sterck, H., Aboulnaga, A., Djambazian, H., & Sladek, R. (2010). Case study of scientific data processing on a cloud using Hadoop. In High Performance Computing Systems and Applications - 23rd International Symposium, HPCS 2009, Revised Selected Papers (pp. 400-415). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5976 LNCS). https://doi.org/10.1007/978-3-642-12659-8_29