PARLO

PArallel run-time layout optimization for scientific data explorations with heterogeneous access patterns

Zhenhuan Gong, David A. Boyuka, Xiaocheng Zou, Qing Liu, Norbert Podhorszki, Scott Klasky, Xiaosong Ma, Nagiza F. Samatova

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-level data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.

Original languageEnglish
Title of host publicationProceedings - 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013
Pages343-351
Number of pages9
DOIs
Publication statusPublished - 14 Aug 2013
Externally publishedYes
Event13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013 - Delft, Netherlands
Duration: 13 May 201316 May 2013

Other

Other13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013
CountryNetherlands
CityDelft
Period13/5/1316/5/13

Fingerprint

Processing
Middleware
XML
Experiments

ASJC Scopus subject areas

  • Software

Cite this

Gong, Z., Boyuka, D. A., Zou, X., Liu, Q., Podhorszki, N., Klasky, S., ... Samatova, N. F. (2013). PARLO: PArallel run-time layout optimization for scientific data explorations with heterogeneous access patterns. In Proceedings - 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013 (pp. 343-351). [6546111] https://doi.org/10.1109/CCGrid.2013.58

PARLO : PArallel run-time layout optimization for scientific data explorations with heterogeneous access patterns. / Gong, Zhenhuan; Boyuka, David A.; Zou, Xiaocheng; Liu, Qing; Podhorszki, Norbert; Klasky, Scott; Ma, Xiaosong; Samatova, Nagiza F.

Proceedings - 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013. 2013. p. 343-351 6546111.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gong, Z, Boyuka, DA, Zou, X, Liu, Q, Podhorszki, N, Klasky, S, Ma, X & Samatova, NF 2013, PARLO: PArallel run-time layout optimization for scientific data explorations with heterogeneous access patterns. in Proceedings - 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013., 6546111, pp. 343-351, 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013, Delft, Netherlands, 13/5/13. https://doi.org/10.1109/CCGrid.2013.58
Gong Z, Boyuka DA, Zou X, Liu Q, Podhorszki N, Klasky S et al. PARLO: PArallel run-time layout optimization for scientific data explorations with heterogeneous access patterns. In Proceedings - 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013. 2013. p. 343-351. 6546111 https://doi.org/10.1109/CCGrid.2013.58
Gong, Zhenhuan ; Boyuka, David A. ; Zou, Xiaocheng ; Liu, Qing ; Podhorszki, Norbert ; Klasky, Scott ; Ma, Xiaosong ; Samatova, Nagiza F. / PARLO : PArallel run-time layout optimization for scientific data explorations with heterogeneous access patterns. Proceedings - 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013. 2013. pp. 343-351
@inproceedings{3f9c87471d1245c59a87018b5d889922,
title = "PARLO: PArallel run-time layout optimization for scientific data explorations with heterogeneous access patterns",
abstract = "The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-level data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56{\%} savings in processing time and a reduction in storage overhead of up to 50{\%}. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.",
author = "Zhenhuan Gong and Boyuka, {David A.} and Xiaocheng Zou and Qing Liu and Norbert Podhorszki and Scott Klasky and Xiaosong Ma and Samatova, {Nagiza F.}",
year = "2013",
month = "8",
day = "14",
doi = "10.1109/CCGrid.2013.58",
language = "English",
pages = "343--351",
booktitle = "Proceedings - 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013",

}

TY - GEN

T1 - PARLO

T2 - PArallel run-time layout optimization for scientific data explorations with heterogeneous access patterns

AU - Gong, Zhenhuan

AU - Boyuka, David A.

AU - Zou, Xiaocheng

AU - Liu, Qing

AU - Podhorszki, Norbert

AU - Klasky, Scott

AU - Ma, Xiaosong

AU - Samatova, Nagiza F.

PY - 2013/8/14

Y1 - 2013/8/14

N2 - The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-level data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.

AB - The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-level data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.

UR - http://www.scopus.com/inward/record.url?scp=84881281874&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84881281874&partnerID=8YFLogxK

U2 - 10.1109/CCGrid.2013.58

DO - 10.1109/CCGrid.2013.58

M3 - Conference contribution

SP - 343

EP - 351

BT - Proceedings - 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013

ER -