S-preconditioner for multi-fold data reduction with guaranteed user-controlled accuracy

Ye Jin, Sriram Lakshminarasimhan, Neil Shah, Zhenhuan Gong, C. S. Chang, Jackie Chen, Stephane Ethier, Hemanth Kolla, Seung Hoe Ku, Scott Klasky, Robert Latham, Robert Ross, Karen Schuchardt, Nagiza F. Samatova

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The growing gap between the massive amounts of data generated by petascale scientific simulation codes and the capability of system hardware and software to effectively analyze this data necessitates data reduction. Yet, the increasing data complexity challenges most, if not all, of the existing data compression methods. In fact, lossless compression techniques offer no more than 10% reduction on scientific data that we have experience with, which is widely regarded as effectively incompressible. To bridge this gap, in this paper, we advocate a transformative strategy that enables fast, accurate, and multi-fold reduction of double-precision floating-point scientific data. The intuition behind our method is inspired by an effective use of preconditioners for linear algebra solvers optimized for a particular class of computational "dwarfs" (e.g., dense or sparse matrices). Focusing on a commonly used multi-resolution wavelet compression technique as the underlying "solver" for data reduction we propose the S-preconditioner, which transforms scientific data into a form with high global regularity to ensure a significant decrease in the number of wavelet coefficients stored for a segment of data. Combined with the subsequent EQ-calibrator, our resultant method (called S-Preconditioned EQ-Calibrated Wavelets (SPEQC-WAVELETS)), robustly achieved a 4- to 5-fold data reduction-while guaranteeing user-defined accuracy of reconstructed data to be within 1% point-by-point relative error, lower than 0.01 Normalized RMSE, and higher than 0.99 Pearson Correlation. In this paper, we show the results we obtained by testing our method on six petascale simulation codes including fusion, combustion, climate, astrophysics, and subsurface groundwater in addition to 13 publicly available scientific datasets. We also demonstrate that application-driven data mining tasks performed on decompressed variables or their derived quantities produce results of comparable quality with the ones for the original data.

Original languageEnglish
Title of host publicationProceedings - IEEE International Conference on Data Mining, ICDM
Pages290-299
Number of pages10
DOIs
Publication statusPublished - 1 Dec 2011
Event11th IEEE International Conference on Data Mining, ICDM 2011 - Vancouver, BC, Canada
Duration: 11 Dec 201114 Dec 2011

Other

Other11th IEEE International Conference on Data Mining, ICDM 2011
CountryCanada
CityVancouver, BC
Period11/12/1114/12/11

Fingerprint

Data reduction
Astrophysics
Linear algebra
Data compression
Data mining
Groundwater
Fusion reactions
Hardware
Testing

Keywords

  • Data mining over decompressed data
  • Data reduction
  • Extreme-scale data analytics
  • In situ data analytics
  • Preconditioners for data mining

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Jin, Y., Lakshminarasimhan, S., Shah, N., Gong, Z., Chang, C. S., Chen, J., ... Samatova, N. F. (2011). S-preconditioner for multi-fold data reduction with guaranteed user-controlled accuracy. In Proceedings - IEEE International Conference on Data Mining, ICDM (pp. 290-299). [6137233] https://doi.org/10.1109/ICDM.2011.138

S-preconditioner for multi-fold data reduction with guaranteed user-controlled accuracy. / Jin, Ye; Lakshminarasimhan, Sriram; Shah, Neil; Gong, Zhenhuan; Chang, C. S.; Chen, Jackie; Ethier, Stephane; Kolla, Hemanth; Ku, Seung Hoe; Klasky, Scott; Latham, Robert; Ross, Robert; Schuchardt, Karen; Samatova, Nagiza F.

Proceedings - IEEE International Conference on Data Mining, ICDM. 2011. p. 290-299 6137233.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Jin, Y, Lakshminarasimhan, S, Shah, N, Gong, Z, Chang, CS, Chen, J, Ethier, S, Kolla, H, Ku, SH, Klasky, S, Latham, R, Ross, R, Schuchardt, K & Samatova, NF 2011, S-preconditioner for multi-fold data reduction with guaranteed user-controlled accuracy. in Proceedings - IEEE International Conference on Data Mining, ICDM., 6137233, pp. 290-299, 11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, Canada, 11/12/11. https://doi.org/10.1109/ICDM.2011.138
Jin Y, Lakshminarasimhan S, Shah N, Gong Z, Chang CS, Chen J et al. S-preconditioner for multi-fold data reduction with guaranteed user-controlled accuracy. In Proceedings - IEEE International Conference on Data Mining, ICDM. 2011. p. 290-299. 6137233 https://doi.org/10.1109/ICDM.2011.138
Jin, Ye ; Lakshminarasimhan, Sriram ; Shah, Neil ; Gong, Zhenhuan ; Chang, C. S. ; Chen, Jackie ; Ethier, Stephane ; Kolla, Hemanth ; Ku, Seung Hoe ; Klasky, Scott ; Latham, Robert ; Ross, Robert ; Schuchardt, Karen ; Samatova, Nagiza F. / S-preconditioner for multi-fold data reduction with guaranteed user-controlled accuracy. Proceedings - IEEE International Conference on Data Mining, ICDM. 2011. pp. 290-299
@inproceedings{f51ce1cff1db481fb96429f4a69e181d,
title = "S-preconditioner for multi-fold data reduction with guaranteed user-controlled accuracy",
abstract = "The growing gap between the massive amounts of data generated by petascale scientific simulation codes and the capability of system hardware and software to effectively analyze this data necessitates data reduction. Yet, the increasing data complexity challenges most, if not all, of the existing data compression methods. In fact, lossless compression techniques offer no more than 10{\%} reduction on scientific data that we have experience with, which is widely regarded as effectively incompressible. To bridge this gap, in this paper, we advocate a transformative strategy that enables fast, accurate, and multi-fold reduction of double-precision floating-point scientific data. The intuition behind our method is inspired by an effective use of preconditioners for linear algebra solvers optimized for a particular class of computational {"}dwarfs{"} (e.g., dense or sparse matrices). Focusing on a commonly used multi-resolution wavelet compression technique as the underlying {"}solver{"} for data reduction we propose the S-preconditioner, which transforms scientific data into a form with high global regularity to ensure a significant decrease in the number of wavelet coefficients stored for a segment of data. Combined with the subsequent EQ-calibrator, our resultant method (called S-Preconditioned EQ-Calibrated Wavelets (SPEQC-WAVELETS)), robustly achieved a 4- to 5-fold data reduction-while guaranteeing user-defined accuracy of reconstructed data to be within 1{\%} point-by-point relative error, lower than 0.01 Normalized RMSE, and higher than 0.99 Pearson Correlation. In this paper, we show the results we obtained by testing our method on six petascale simulation codes including fusion, combustion, climate, astrophysics, and subsurface groundwater in addition to 13 publicly available scientific datasets. We also demonstrate that application-driven data mining tasks performed on decompressed variables or their derived quantities produce results of comparable quality with the ones for the original data.",
keywords = "Data mining over decompressed data, Data reduction, Extreme-scale data analytics, In situ data analytics, Preconditioners for data mining",
author = "Ye Jin and Sriram Lakshminarasimhan and Neil Shah and Zhenhuan Gong and Chang, {C. S.} and Jackie Chen and Stephane Ethier and Hemanth Kolla and Ku, {Seung Hoe} and Scott Klasky and Robert Latham and Robert Ross and Karen Schuchardt and Samatova, {Nagiza F.}",
year = "2011",
month = "12",
day = "1",
doi = "10.1109/ICDM.2011.138",
language = "English",
isbn = "9780769544083",
pages = "290--299",
booktitle = "Proceedings - IEEE International Conference on Data Mining, ICDM",

}

TY - GEN

T1 - S-preconditioner for multi-fold data reduction with guaranteed user-controlled accuracy

AU - Jin, Ye

AU - Lakshminarasimhan, Sriram

AU - Shah, Neil

AU - Gong, Zhenhuan

AU - Chang, C. S.

AU - Chen, Jackie

AU - Ethier, Stephane

AU - Kolla, Hemanth

AU - Ku, Seung Hoe

AU - Klasky, Scott

AU - Latham, Robert

AU - Ross, Robert

AU - Schuchardt, Karen

AU - Samatova, Nagiza F.

PY - 2011/12/1

Y1 - 2011/12/1

N2 - The growing gap between the massive amounts of data generated by petascale scientific simulation codes and the capability of system hardware and software to effectively analyze this data necessitates data reduction. Yet, the increasing data complexity challenges most, if not all, of the existing data compression methods. In fact, lossless compression techniques offer no more than 10% reduction on scientific data that we have experience with, which is widely regarded as effectively incompressible. To bridge this gap, in this paper, we advocate a transformative strategy that enables fast, accurate, and multi-fold reduction of double-precision floating-point scientific data. The intuition behind our method is inspired by an effective use of preconditioners for linear algebra solvers optimized for a particular class of computational "dwarfs" (e.g., dense or sparse matrices). Focusing on a commonly used multi-resolution wavelet compression technique as the underlying "solver" for data reduction we propose the S-preconditioner, which transforms scientific data into a form with high global regularity to ensure a significant decrease in the number of wavelet coefficients stored for a segment of data. Combined with the subsequent EQ-calibrator, our resultant method (called S-Preconditioned EQ-Calibrated Wavelets (SPEQC-WAVELETS)), robustly achieved a 4- to 5-fold data reduction-while guaranteeing user-defined accuracy of reconstructed data to be within 1% point-by-point relative error, lower than 0.01 Normalized RMSE, and higher than 0.99 Pearson Correlation. In this paper, we show the results we obtained by testing our method on six petascale simulation codes including fusion, combustion, climate, astrophysics, and subsurface groundwater in addition to 13 publicly available scientific datasets. We also demonstrate that application-driven data mining tasks performed on decompressed variables or their derived quantities produce results of comparable quality with the ones for the original data.

AB - The growing gap between the massive amounts of data generated by petascale scientific simulation codes and the capability of system hardware and software to effectively analyze this data necessitates data reduction. Yet, the increasing data complexity challenges most, if not all, of the existing data compression methods. In fact, lossless compression techniques offer no more than 10% reduction on scientific data that we have experience with, which is widely regarded as effectively incompressible. To bridge this gap, in this paper, we advocate a transformative strategy that enables fast, accurate, and multi-fold reduction of double-precision floating-point scientific data. The intuition behind our method is inspired by an effective use of preconditioners for linear algebra solvers optimized for a particular class of computational "dwarfs" (e.g., dense or sparse matrices). Focusing on a commonly used multi-resolution wavelet compression technique as the underlying "solver" for data reduction we propose the S-preconditioner, which transforms scientific data into a form with high global regularity to ensure a significant decrease in the number of wavelet coefficients stored for a segment of data. Combined with the subsequent EQ-calibrator, our resultant method (called S-Preconditioned EQ-Calibrated Wavelets (SPEQC-WAVELETS)), robustly achieved a 4- to 5-fold data reduction-while guaranteeing user-defined accuracy of reconstructed data to be within 1% point-by-point relative error, lower than 0.01 Normalized RMSE, and higher than 0.99 Pearson Correlation. In this paper, we show the results we obtained by testing our method on six petascale simulation codes including fusion, combustion, climate, astrophysics, and subsurface groundwater in addition to 13 publicly available scientific datasets. We also demonstrate that application-driven data mining tasks performed on decompressed variables or their derived quantities produce results of comparable quality with the ones for the original data.

KW - Data mining over decompressed data

KW - Data reduction

KW - Extreme-scale data analytics

KW - In situ data analytics

KW - Preconditioners for data mining

UR - http://www.scopus.com/inward/record.url?scp=84863181022&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863181022&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2011.138

DO - 10.1109/ICDM.2011.138

M3 - Conference contribution

SN - 9780769544083

SP - 290

EP - 299

BT - Proceedings - IEEE International Conference on Data Mining, ICDM

ER -