ISOBAR preconditioner for effective and high-throughput lossless data compression

Eric R. Schendel, Ye Jin, Neil Shah, Jackie Chen, C. S. Chang, Seung Hoe Ku, Stephane Ethier, Scott Klasky, Robert Latham, Robert Ross, Nagiza F. Samatova

Research output: Contribution to journalArticle

49 Citations (Scopus)

Abstract

Efficient handling of large volumes of data is a necessity for exascale scientific applications and database systems. To address the growing imbalance between the amount of available storage and the amount of data being produced by high speed (FLOPS) processors on the system, data must be compressed to reduce the total amount of data placed on the file systems. General-purpose loss less compression frameworks, such as zlib and bzlib2, are commonly used on datasets requiring loss less compression. Quite often, however, many scientific data sets compress poorly, referred to as hard-to-compress datasets, due to the negative impact of highly entropic content represented within the data. An important problem in better loss less data compression is to identify the hard-to-compress information and subsequently optimize the compression techniques at the byte-level. To address this challenge, we introduce the In-Situ Orthogonal Byte Aggregate Reduction Compression (ISOBAR-compress) methodology as a preconditioner of loss less compression to identify and optimize the compression efficiency and throughput of hard-to-compress datasets.

Original languageEnglish
Article number6228079
Pages (from-to)138-149
Number of pages12
JournalProceedings - International Conference on Data Engineering
DOIs
Publication statusPublished - 30 Jul 2012

Fingerprint

Data compression
compression
Throughput
Compaction
methodology
loss

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Software

Cite this

Schendel, E. R., Jin, Y., Shah, N., Chen, J., Chang, C. S., Ku, S. H., ... Samatova, N. F. (2012). ISOBAR preconditioner for effective and high-throughput lossless data compression. Proceedings - International Conference on Data Engineering, 138-149. [6228079]. https://doi.org/10.1109/ICDE.2012.114

ISOBAR preconditioner for effective and high-throughput lossless data compression. / Schendel, Eric R.; Jin, Ye; Shah, Neil; Chen, Jackie; Chang, C. S.; Ku, Seung Hoe; Ethier, Stephane; Klasky, Scott; Latham, Robert; Ross, Robert; Samatova, Nagiza F.

In: Proceedings - International Conference on Data Engineering, 30.07.2012, p. 138-149.

Research output: Contribution to journalArticle

Schendel, ER, Jin, Y, Shah, N, Chen, J, Chang, CS, Ku, SH, Ethier, S, Klasky, S, Latham, R, Ross, R & Samatova, NF 2012, 'ISOBAR preconditioner for effective and high-throughput lossless data compression', Proceedings - International Conference on Data Engineering, pp. 138-149. https://doi.org/10.1109/ICDE.2012.114
Schendel, Eric R. ; Jin, Ye ; Shah, Neil ; Chen, Jackie ; Chang, C. S. ; Ku, Seung Hoe ; Ethier, Stephane ; Klasky, Scott ; Latham, Robert ; Ross, Robert ; Samatova, Nagiza F. / ISOBAR preconditioner for effective and high-throughput lossless data compression. In: Proceedings - International Conference on Data Engineering. 2012 ; pp. 138-149.
@article{1dbb2729cb794488a32f2cac95cfb1ea,
title = "ISOBAR preconditioner for effective and high-throughput lossless data compression",
abstract = "Efficient handling of large volumes of data is a necessity for exascale scientific applications and database systems. To address the growing imbalance between the amount of available storage and the amount of data being produced by high speed (FLOPS) processors on the system, data must be compressed to reduce the total amount of data placed on the file systems. General-purpose loss less compression frameworks, such as zlib and bzlib2, are commonly used on datasets requiring loss less compression. Quite often, however, many scientific data sets compress poorly, referred to as hard-to-compress datasets, due to the negative impact of highly entropic content represented within the data. An important problem in better loss less data compression is to identify the hard-to-compress information and subsequently optimize the compression techniques at the byte-level. To address this challenge, we introduce the In-Situ Orthogonal Byte Aggregate Reduction Compression (ISOBAR-compress) methodology as a preconditioner of loss less compression to identify and optimize the compression efficiency and throughput of hard-to-compress datasets.",
author = "Schendel, {Eric R.} and Ye Jin and Neil Shah and Jackie Chen and Chang, {C. S.} and Ku, {Seung Hoe} and Stephane Ethier and Scott Klasky and Robert Latham and Robert Ross and Samatova, {Nagiza F.}",
year = "2012",
month = "7",
day = "30",
doi = "10.1109/ICDE.2012.114",
language = "English",
pages = "138--149",
journal = "JAPCA",
issn = "1073-161X",
publisher = "Taylor and Francis Ltd.",

}

TY - JOUR

T1 - ISOBAR preconditioner for effective and high-throughput lossless data compression

AU - Schendel, Eric R.

AU - Jin, Ye

AU - Shah, Neil

AU - Chen, Jackie

AU - Chang, C. S.

AU - Ku, Seung Hoe

AU - Ethier, Stephane

AU - Klasky, Scott

AU - Latham, Robert

AU - Ross, Robert

AU - Samatova, Nagiza F.

PY - 2012/7/30

Y1 - 2012/7/30

N2 - Efficient handling of large volumes of data is a necessity for exascale scientific applications and database systems. To address the growing imbalance between the amount of available storage and the amount of data being produced by high speed (FLOPS) processors on the system, data must be compressed to reduce the total amount of data placed on the file systems. General-purpose loss less compression frameworks, such as zlib and bzlib2, are commonly used on datasets requiring loss less compression. Quite often, however, many scientific data sets compress poorly, referred to as hard-to-compress datasets, due to the negative impact of highly entropic content represented within the data. An important problem in better loss less data compression is to identify the hard-to-compress information and subsequently optimize the compression techniques at the byte-level. To address this challenge, we introduce the In-Situ Orthogonal Byte Aggregate Reduction Compression (ISOBAR-compress) methodology as a preconditioner of loss less compression to identify and optimize the compression efficiency and throughput of hard-to-compress datasets.

AB - Efficient handling of large volumes of data is a necessity for exascale scientific applications and database systems. To address the growing imbalance between the amount of available storage and the amount of data being produced by high speed (FLOPS) processors on the system, data must be compressed to reduce the total amount of data placed on the file systems. General-purpose loss less compression frameworks, such as zlib and bzlib2, are commonly used on datasets requiring loss less compression. Quite often, however, many scientific data sets compress poorly, referred to as hard-to-compress datasets, due to the negative impact of highly entropic content represented within the data. An important problem in better loss less data compression is to identify the hard-to-compress information and subsequently optimize the compression techniques at the byte-level. To address this challenge, we introduce the In-Situ Orthogonal Byte Aggregate Reduction Compression (ISOBAR-compress) methodology as a preconditioner of loss less compression to identify and optimize the compression efficiency and throughput of hard-to-compress datasets.

UR - http://www.scopus.com/inward/record.url?scp=84864224817&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864224817&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2012.114

DO - 10.1109/ICDE.2012.114

M3 - Article

SP - 138

EP - 149

JO - JAPCA

JF - JAPCA

SN - 1073-161X

M1 - 6228079

ER -