Exploring I/O strategies for parallel sequence-search tools with S3aSim

Avery Chingt, Wu Chun Feng, Heshan Lin, Xiaosong Ma, Alok Choudhary

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Parallel sequence-search tools are rising in popularity among computational biologists. With the rapid growth of sequence databases, database segmentation is the trend of the future for such search tools. While I/O currently is not a significant bottleneck for parallel sequence-search tools, future technologies including faster processors, customized computational hardware such as FPGAs, improved search algorithms, and exponentially growing databases will emphasize an increasing need for efficient parallel I/O in future parallel sequence-search tools. Our paper focuses on examining different I/O strategies for these future tools in a modern parallel file system (PVFS2). Because implementing and comparing various I/O algorithms in every search tool is labor-intensive and time-consuming, we introduce S3aSim, a general simulation framework for sequence-search which allows us to quickly implement, test, and profile various I/O strategies. We examine a variety of I/O strategies (e.g., master-writing and various worker-writing strategies using individual and collective I/O methods) for storing result data in sequence-search tools such as mpiBLAST, pioBLAST, and parallel HMMer. Our experiments fully detail the interaction of computing and I/O within a full application simulation as opposed to typical I/O-only benchmarks.

Original languageEnglish
Title of host publicationProceedings of the IEEE International Symposium on High Performance Distributed Computing
Pages229-240
Number of pages12
Volume2006
Publication statusPublished - 1 Dec 2006
Externally publishedYes
Event15th IEEE International Symposium on High Performance Distributed Computing, HPDC-15 - Paris, France
Duration: 19 Jun 200623 Jun 2006

Other

Other15th IEEE International Symposium on High Performance Distributed Computing, HPDC-15
CountryFrance
CityParis
Period19/6/0623/6/06

Fingerprint

Field programmable gate arrays (FPGA)
Personnel
Hardware
Experiments

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Chingt, A., Feng, W. C., Lin, H., Ma, X., & Choudhary, A. (2006). Exploring I/O strategies for parallel sequence-search tools with S3aSim. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (Vol. 2006, pp. 229-240). [1652154]

Exploring I/O strategies for parallel sequence-search tools with S3aSim. / Chingt, Avery; Feng, Wu Chun; Lin, Heshan; Ma, Xiaosong; Choudhary, Alok.

Proceedings of the IEEE International Symposium on High Performance Distributed Computing. Vol. 2006 2006. p. 229-240 1652154.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chingt, A, Feng, WC, Lin, H, Ma, X & Choudhary, A 2006, Exploring I/O strategies for parallel sequence-search tools with S3aSim. in Proceedings of the IEEE International Symposium on High Performance Distributed Computing. vol. 2006, 1652154, pp. 229-240, 15th IEEE International Symposium on High Performance Distributed Computing, HPDC-15, Paris, France, 19/6/06.
Chingt A, Feng WC, Lin H, Ma X, Choudhary A. Exploring I/O strategies for parallel sequence-search tools with S3aSim. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing. Vol. 2006. 2006. p. 229-240. 1652154
Chingt, Avery ; Feng, Wu Chun ; Lin, Heshan ; Ma, Xiaosong ; Choudhary, Alok. / Exploring I/O strategies for parallel sequence-search tools with S3aSim. Proceedings of the IEEE International Symposium on High Performance Distributed Computing. Vol. 2006 2006. pp. 229-240
@inproceedings{72c624cf659d47f7b31dcb7a4d462c0e,
title = "Exploring I/O strategies for parallel sequence-search tools with S3aSim",
abstract = "Parallel sequence-search tools are rising in popularity among computational biologists. With the rapid growth of sequence databases, database segmentation is the trend of the future for such search tools. While I/O currently is not a significant bottleneck for parallel sequence-search tools, future technologies including faster processors, customized computational hardware such as FPGAs, improved search algorithms, and exponentially growing databases will emphasize an increasing need for efficient parallel I/O in future parallel sequence-search tools. Our paper focuses on examining different I/O strategies for these future tools in a modern parallel file system (PVFS2). Because implementing and comparing various I/O algorithms in every search tool is labor-intensive and time-consuming, we introduce S3aSim, a general simulation framework for sequence-search which allows us to quickly implement, test, and profile various I/O strategies. We examine a variety of I/O strategies (e.g., master-writing and various worker-writing strategies using individual and collective I/O methods) for storing result data in sequence-search tools such as mpiBLAST, pioBLAST, and parallel HMMer. Our experiments fully detail the interaction of computing and I/O within a full application simulation as opposed to typical I/O-only benchmarks.",
author = "Avery Chingt and Feng, {Wu Chun} and Heshan Lin and Xiaosong Ma and Alok Choudhary",
year = "2006",
month = "12",
day = "1",
language = "English",
isbn = "1424403073",
volume = "2006",
pages = "229--240",
booktitle = "Proceedings of the IEEE International Symposium on High Performance Distributed Computing",

}

TY - GEN

T1 - Exploring I/O strategies for parallel sequence-search tools with S3aSim

AU - Chingt, Avery

AU - Feng, Wu Chun

AU - Lin, Heshan

AU - Ma, Xiaosong

AU - Choudhary, Alok

PY - 2006/12/1

Y1 - 2006/12/1

N2 - Parallel sequence-search tools are rising in popularity among computational biologists. With the rapid growth of sequence databases, database segmentation is the trend of the future for such search tools. While I/O currently is not a significant bottleneck for parallel sequence-search tools, future technologies including faster processors, customized computational hardware such as FPGAs, improved search algorithms, and exponentially growing databases will emphasize an increasing need for efficient parallel I/O in future parallel sequence-search tools. Our paper focuses on examining different I/O strategies for these future tools in a modern parallel file system (PVFS2). Because implementing and comparing various I/O algorithms in every search tool is labor-intensive and time-consuming, we introduce S3aSim, a general simulation framework for sequence-search which allows us to quickly implement, test, and profile various I/O strategies. We examine a variety of I/O strategies (e.g., master-writing and various worker-writing strategies using individual and collective I/O methods) for storing result data in sequence-search tools such as mpiBLAST, pioBLAST, and parallel HMMer. Our experiments fully detail the interaction of computing and I/O within a full application simulation as opposed to typical I/O-only benchmarks.

AB - Parallel sequence-search tools are rising in popularity among computational biologists. With the rapid growth of sequence databases, database segmentation is the trend of the future for such search tools. While I/O currently is not a significant bottleneck for parallel sequence-search tools, future technologies including faster processors, customized computational hardware such as FPGAs, improved search algorithms, and exponentially growing databases will emphasize an increasing need for efficient parallel I/O in future parallel sequence-search tools. Our paper focuses on examining different I/O strategies for these future tools in a modern parallel file system (PVFS2). Because implementing and comparing various I/O algorithms in every search tool is labor-intensive and time-consuming, we introduce S3aSim, a general simulation framework for sequence-search which allows us to quickly implement, test, and profile various I/O strategies. We examine a variety of I/O strategies (e.g., master-writing and various worker-writing strategies using individual and collective I/O methods) for storing result data in sequence-search tools such as mpiBLAST, pioBLAST, and parallel HMMer. Our experiments fully detail the interaction of computing and I/O within a full application simulation as opposed to typical I/O-only benchmarks.

UR - http://www.scopus.com/inward/record.url?scp=33845901845&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33845901845&partnerID=8YFLogxK

M3 - Conference contribution

SN - 1424403073

SN - 9781424403073

VL - 2006

SP - 229

EP - 240

BT - Proceedings of the IEEE International Symposium on High Performance Distributed Computing

ER -