Enhancing data migration performance via parallel data compression

Jonghyun Lee, M. Winslett, Xiaosong Ma, Shengke Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Scientific simulations often produce large volumes of output that are moved to another platform for visualization or storage. This long-distance migration is slow due to the data size and slow network. Compression can improve migration performance by reducing the data size, but compression is computation-intensive and so can raise costs. In this work, we show how to reduce data migration cost by incorporating compression into migration. We analyze eight scientific data sets, and propose three approaches for parallel compression of scientific data. Our results show that with reasonably fast processors and typical parallel configurations, the compression cost for large scientific data is outweighed by the performance gain obtained by migrating less data. We found that a client-side compression approach (CC) can improve I/O and migration performance by an order of magnitude. In our experiments, CC always matches or outperforms migration without compression when we overlap migration with computation, even for not very compressible dense floating point data. We also present a variant of CC that is well suited for use with implementations of two-phase I/O.

Original languageEnglish
Title of host publicationProceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages444-451
Number of pages8
ISBN (Print)0769515738, 9780769515731
DOIs
Publication statusPublished - 2002
Externally publishedYes
Event16th International Parallel and Distributed Processing Symposium, IPDPS 2002 - Ft. Lauderdale, United States
Duration: 15 Apr 200219 Apr 2002

Other

Other16th International Parallel and Distributed Processing Symposium, IPDPS 2002
CountryUnited States
CityFt. Lauderdale
Period15/4/0219/4/02

Fingerprint

Data compression
Data Compression
Migration
Compression
Costs
Visualization
Experiments
Floating point
Overlap
Configuration
Output

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Modelling and Simulation

Cite this

Lee, J., Winslett, M., Ma, X., & Yu, S. (2002). Enhancing data migration performance via parallel data compression. In Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002 (pp. 444-451). [1015528] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IPDPS.2002.1015528

Enhancing data migration performance via parallel data compression. / Lee, Jonghyun; Winslett, M.; Ma, Xiaosong; Yu, Shengke.

Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002. Institute of Electrical and Electronics Engineers Inc., 2002. p. 444-451 1015528.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lee, J, Winslett, M, Ma, X & Yu, S 2002, Enhancing data migration performance via parallel data compression. in Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002., 1015528, Institute of Electrical and Electronics Engineers Inc., pp. 444-451, 16th International Parallel and Distributed Processing Symposium, IPDPS 2002, Ft. Lauderdale, United States, 15/4/02. https://doi.org/10.1109/IPDPS.2002.1015528
Lee J, Winslett M, Ma X, Yu S. Enhancing data migration performance via parallel data compression. In Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002. Institute of Electrical and Electronics Engineers Inc. 2002. p. 444-451. 1015528 https://doi.org/10.1109/IPDPS.2002.1015528
Lee, Jonghyun ; Winslett, M. ; Ma, Xiaosong ; Yu, Shengke. / Enhancing data migration performance via parallel data compression. Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002. Institute of Electrical and Electronics Engineers Inc., 2002. pp. 444-451
@inproceedings{4de5a0b0e4b94dc1984af1ad5cec129c,
title = "Enhancing data migration performance via parallel data compression",
abstract = "Scientific simulations often produce large volumes of output that are moved to another platform for visualization or storage. This long-distance migration is slow due to the data size and slow network. Compression can improve migration performance by reducing the data size, but compression is computation-intensive and so can raise costs. In this work, we show how to reduce data migration cost by incorporating compression into migration. We analyze eight scientific data sets, and propose three approaches for parallel compression of scientific data. Our results show that with reasonably fast processors and typical parallel configurations, the compression cost for large scientific data is outweighed by the performance gain obtained by migrating less data. We found that a client-side compression approach (CC) can improve I/O and migration performance by an order of magnitude. In our experiments, CC always matches or outperforms migration without compression when we overlap migration with computation, even for not very compressible dense floating point data. We also present a variant of CC that is well suited for use with implementations of two-phase I/O.",
author = "Jonghyun Lee and M. Winslett and Xiaosong Ma and Shengke Yu",
year = "2002",
doi = "10.1109/IPDPS.2002.1015528",
language = "English",
isbn = "0769515738",
pages = "444--451",
booktitle = "Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Enhancing data migration performance via parallel data compression

AU - Lee, Jonghyun

AU - Winslett, M.

AU - Ma, Xiaosong

AU - Yu, Shengke

PY - 2002

Y1 - 2002

N2 - Scientific simulations often produce large volumes of output that are moved to another platform for visualization or storage. This long-distance migration is slow due to the data size and slow network. Compression can improve migration performance by reducing the data size, but compression is computation-intensive and so can raise costs. In this work, we show how to reduce data migration cost by incorporating compression into migration. We analyze eight scientific data sets, and propose three approaches for parallel compression of scientific data. Our results show that with reasonably fast processors and typical parallel configurations, the compression cost for large scientific data is outweighed by the performance gain obtained by migrating less data. We found that a client-side compression approach (CC) can improve I/O and migration performance by an order of magnitude. In our experiments, CC always matches or outperforms migration without compression when we overlap migration with computation, even for not very compressible dense floating point data. We also present a variant of CC that is well suited for use with implementations of two-phase I/O.

AB - Scientific simulations often produce large volumes of output that are moved to another platform for visualization or storage. This long-distance migration is slow due to the data size and slow network. Compression can improve migration performance by reducing the data size, but compression is computation-intensive and so can raise costs. In this work, we show how to reduce data migration cost by incorporating compression into migration. We analyze eight scientific data sets, and propose three approaches for parallel compression of scientific data. Our results show that with reasonably fast processors and typical parallel configurations, the compression cost for large scientific data is outweighed by the performance gain obtained by migrating less data. We found that a client-side compression approach (CC) can improve I/O and migration performance by an order of magnitude. In our experiments, CC always matches or outperforms migration without compression when we overlap migration with computation, even for not very compressible dense floating point data. We also present a variant of CC that is well suited for use with implementations of two-phase I/O.

UR - http://www.scopus.com/inward/record.url?scp=84966559314&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84966559314&partnerID=8YFLogxK

U2 - 10.1109/IPDPS.2002.1015528

DO - 10.1109/IPDPS.2002.1015528

M3 - Conference contribution

AN - SCOPUS:84966559314

SN - 0769515738

SN - 9780769515731

SP - 444

EP - 451

BT - Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002

PB - Institute of Electrical and Electronics Engineers Inc.

ER -