Massively parallel genomic sequence search on the Blue Gene/P architecture

Heshan Lin, Pavan Balaji, Ruth Poole, Carlos Sosa, Xiaosong Ma, Wu Chun Feng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

31 Citations (Scopus)

Abstract

This paper presents our first experiences in mapping and optimizing genomic sequence search onto the massively parallel IBM Blue Gene/P (BG/P) platform. Specifically, we performed our work on mpiBLAST, a parallel sequence-search code that has been optimized on numerous supercomputing environments. In doing so, we identify several critical performance issues. Consequently, we propose and study different approaches for mapping sequence-search and parallel I/O tasks on such massively parallel architectures.We demonstrate that our optimizations can deliver nearly linear scaling (93% efficiency) on up to 32,768 cores of BG/P. In addition, we show that such scalability enables us to complete a large-scale bioinformatics problem - sequence searching a microbial genome database against itself to support the discovery of missing genes in genomes - in only a few hours on BG/P. Previously, this problem was viewed as computationally intractable in practice.

Original languageEnglish
Title of host publication2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008
DOIs
Publication statusPublished - 1 Dec 2008
Externally publishedYes
Event2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008 - Austin, TX, United States
Duration: 15 Nov 200821 Nov 2008

Other

Other2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008
CountryUnited States
CityAustin, TX
Period15/11/0821/11/08

Fingerprint

Genes
Parallel architectures
Bioinformatics
Scalability

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Cite this

Lin, H., Balaji, P., Poole, R., Sosa, C., Ma, X., & Feng, W. C. (2008). Massively parallel genomic sequence search on the Blue Gene/P architecture. In 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008 [5222005] https://doi.org/10.1109/SC.2008.5222005

Massively parallel genomic sequence search on the Blue Gene/P architecture. / Lin, Heshan; Balaji, Pavan; Poole, Ruth; Sosa, Carlos; Ma, Xiaosong; Feng, Wu Chun.

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008. 2008. 5222005.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lin, H, Balaji, P, Poole, R, Sosa, C, Ma, X & Feng, WC 2008, Massively parallel genomic sequence search on the Blue Gene/P architecture. in 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008., 5222005, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, Austin, TX, United States, 15/11/08. https://doi.org/10.1109/SC.2008.5222005
Lin H, Balaji P, Poole R, Sosa C, Ma X, Feng WC. Massively parallel genomic sequence search on the Blue Gene/P architecture. In 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008. 2008. 5222005 https://doi.org/10.1109/SC.2008.5222005
Lin, Heshan ; Balaji, Pavan ; Poole, Ruth ; Sosa, Carlos ; Ma, Xiaosong ; Feng, Wu Chun. / Massively parallel genomic sequence search on the Blue Gene/P architecture. 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008. 2008.
@inproceedings{c4d7ddf34fa845cf852c98e08ce4beb9,
title = "Massively parallel genomic sequence search on the Blue Gene/P architecture",
abstract = "This paper presents our first experiences in mapping and optimizing genomic sequence search onto the massively parallel IBM Blue Gene/P (BG/P) platform. Specifically, we performed our work on mpiBLAST, a parallel sequence-search code that has been optimized on numerous supercomputing environments. In doing so, we identify several critical performance issues. Consequently, we propose and study different approaches for mapping sequence-search and parallel I/O tasks on such massively parallel architectures.We demonstrate that our optimizations can deliver nearly linear scaling (93{\%} efficiency) on up to 32,768 cores of BG/P. In addition, we show that such scalability enables us to complete a large-scale bioinformatics problem - sequence searching a microbial genome database against itself to support the discovery of missing genes in genomes - in only a few hours on BG/P. Previously, this problem was viewed as computationally intractable in practice.",
author = "Heshan Lin and Pavan Balaji and Ruth Poole and Carlos Sosa and Xiaosong Ma and Feng, {Wu Chun}",
year = "2008",
month = "12",
day = "1",
doi = "10.1109/SC.2008.5222005",
language = "English",
isbn = "9781424428359",
booktitle = "2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008",

}

TY - GEN

T1 - Massively parallel genomic sequence search on the Blue Gene/P architecture

AU - Lin, Heshan

AU - Balaji, Pavan

AU - Poole, Ruth

AU - Sosa, Carlos

AU - Ma, Xiaosong

AU - Feng, Wu Chun

PY - 2008/12/1

Y1 - 2008/12/1

N2 - This paper presents our first experiences in mapping and optimizing genomic sequence search onto the massively parallel IBM Blue Gene/P (BG/P) platform. Specifically, we performed our work on mpiBLAST, a parallel sequence-search code that has been optimized on numerous supercomputing environments. In doing so, we identify several critical performance issues. Consequently, we propose and study different approaches for mapping sequence-search and parallel I/O tasks on such massively parallel architectures.We demonstrate that our optimizations can deliver nearly linear scaling (93% efficiency) on up to 32,768 cores of BG/P. In addition, we show that such scalability enables us to complete a large-scale bioinformatics problem - sequence searching a microbial genome database against itself to support the discovery of missing genes in genomes - in only a few hours on BG/P. Previously, this problem was viewed as computationally intractable in practice.

AB - This paper presents our first experiences in mapping and optimizing genomic sequence search onto the massively parallel IBM Blue Gene/P (BG/P) platform. Specifically, we performed our work on mpiBLAST, a parallel sequence-search code that has been optimized on numerous supercomputing environments. In doing so, we identify several critical performance issues. Consequently, we propose and study different approaches for mapping sequence-search and parallel I/O tasks on such massively parallel architectures.We demonstrate that our optimizations can deliver nearly linear scaling (93% efficiency) on up to 32,768 cores of BG/P. In addition, we show that such scalability enables us to complete a large-scale bioinformatics problem - sequence searching a microbial genome database against itself to support the discovery of missing genes in genomes - in only a few hours on BG/P. Previously, this problem was viewed as computationally intractable in practice.

UR - http://www.scopus.com/inward/record.url?scp=70350782517&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70350782517&partnerID=8YFLogxK

U2 - 10.1109/SC.2008.5222005

DO - 10.1109/SC.2008.5222005

M3 - Conference contribution

SN - 9781424428359

BT - 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008

ER -