PARLO: PArallel run-time layout optimization for scientific data explorations with heterogeneous access patterns

Zhenhuan Gong, David A. Boyuka, Xiaocheng Zou, Qing Liu, Norbert Podhorszki, Scott Klasky, Xiaosong Ma, Nagiza F. Samatova

Research output: Contribution to conferencePaper

15 Citations (Scopus)

Abstract

The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-level data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.

Original languageEnglish
Pages343-351
Number of pages9
DOIs
Publication statusPublished - 14 Aug 2013
Externally publishedYes
Event13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013 - Delft, Netherlands
Duration: 13 May 201316 May 2013

Other

Other13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013
CountryNetherlands
CityDelft
Period13/5/1316/5/13

    Fingerprint

ASJC Scopus subject areas

  • Software

Cite this

Gong, Z., Boyuka, D. A., Zou, X., Liu, Q., Podhorszki, N., Klasky, S., Ma, X., & Samatova, N. F. (2013). PARLO: PArallel run-time layout optimization for scientific data explorations with heterogeneous access patterns. 343-351. Paper presented at 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013, Delft, Netherlands. https://doi.org/10.1109/CCGrid.2013.58