Coordinating computation and I/O in massively parallel sequence search

Heshan Lin, Xiaosong Ma, Wuchun Feng, Nagiza F. Samatova

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation and data-intensive scientific applications. Our previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these runtime irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have been well-studied independently, little attention has been paid to the interplay between the two. In this paper, we systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search. Our study reveals that the lack of coordination between computation scheduling and I/O optimization could result in severe performance issues. We then propose an integrated scheduling approach that effectively improves sequence-search throughput by gracefully coordinating the dynamic load balancing of computation and high-performance noncontiguous I/O.

Original languageEnglish
Article number5473216
Pages (from-to)529-543
Number of pages15
JournalIEEE Transactions on Parallel and Distributed Systems
Volume22
Issue number4
DOIs
Publication statusPublished - 14 Jan 2011
Externally publishedYes

Fingerprint

Scheduling
Dynamic loads
Resource allocation
Throughput

Keywords

  • bioinformatics
  • BLAST
  • parallel genomic sequence search
  • parallel I/O
  • Scheduling

ASJC Scopus subject areas

  • Hardware and Architecture
  • Signal Processing
  • Computational Theory and Mathematics

Cite this

Coordinating computation and I/O in massively parallel sequence search. / Lin, Heshan; Ma, Xiaosong; Feng, Wuchun; Samatova, Nagiza F.

In: IEEE Transactions on Parallel and Distributed Systems, Vol. 22, No. 4, 5473216, 14.01.2011, p. 529-543.

Research output: Contribution to journalArticle

Lin, Heshan ; Ma, Xiaosong ; Feng, Wuchun ; Samatova, Nagiza F. / Coordinating computation and I/O in massively parallel sequence search. In: IEEE Transactions on Parallel and Distributed Systems. 2011 ; Vol. 22, No. 4. pp. 529-543.
@article{5155f866922040a3996bc45df4ca2fc2,
title = "Coordinating computation and I/O in massively parallel sequence search",
abstract = "With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation and data-intensive scientific applications. Our previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these runtime irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have been well-studied independently, little attention has been paid to the interplay between the two. In this paper, we systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search. Our study reveals that the lack of coordination between computation scheduling and I/O optimization could result in severe performance issues. We then propose an integrated scheduling approach that effectively improves sequence-search throughput by gracefully coordinating the dynamic load balancing of computation and high-performance noncontiguous I/O.",
keywords = "bioinformatics, BLAST, parallel genomic sequence search, parallel I/O, Scheduling",
author = "Heshan Lin and Xiaosong Ma and Wuchun Feng and Samatova, {Nagiza F.}",
year = "2011",
month = "1",
day = "14",
doi = "10.1109/TPDS.2010.101",
language = "English",
volume = "22",
pages = "529--543",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "4",

}

TY - JOUR

T1 - Coordinating computation and I/O in massively parallel sequence search

AU - Lin, Heshan

AU - Ma, Xiaosong

AU - Feng, Wuchun

AU - Samatova, Nagiza F.

PY - 2011/1/14

Y1 - 2011/1/14

N2 - With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation and data-intensive scientific applications. Our previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these runtime irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have been well-studied independently, little attention has been paid to the interplay between the two. In this paper, we systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search. Our study reveals that the lack of coordination between computation scheduling and I/O optimization could result in severe performance issues. We then propose an integrated scheduling approach that effectively improves sequence-search throughput by gracefully coordinating the dynamic load balancing of computation and high-performance noncontiguous I/O.

AB - With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation and data-intensive scientific applications. Our previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these runtime irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have been well-studied independently, little attention has been paid to the interplay between the two. In this paper, we systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search. Our study reveals that the lack of coordination between computation scheduling and I/O optimization could result in severe performance issues. We then propose an integrated scheduling approach that effectively improves sequence-search throughput by gracefully coordinating the dynamic load balancing of computation and high-performance noncontiguous I/O.

KW - bioinformatics

KW - BLAST

KW - parallel genomic sequence search

KW - parallel I/O

KW - Scheduling

UR - http://www.scopus.com/inward/record.url?scp=79952073478&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952073478&partnerID=8YFLogxK

U2 - 10.1109/TPDS.2010.101

DO - 10.1109/TPDS.2010.101

M3 - Article

VL - 22

SP - 529

EP - 543

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

SN - 1045-9219

IS - 4

M1 - 5473216

ER -