Automatic Cloud I/O Configurator for I/O Intensive Parallel Applications

Jidong Zhai, Mingliang Liu, Ye Jin, Xiaosong Ma, Wenguang Chen

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

As the cloud platform becomes a promising alternative to traditional HPC (high performance computing) centers or in-house clusters, the I/O bottleneck problem is highlighted in this new environment, typically with top-of-The-line compute instances but sub-par communication and I/O facilities. It has been observed that changing the cloud I/O system configurations, such as choices of file systems, number of I/O servers and their placement strategies, etc., will lead to a considerable variation in the performance and cost efficiency of I/O intensive parallel applications. However, storage system configuration is tedious and error-prone to do manually, even for expert users, leading to solutions that are grossly over-provisioned (low cost inefficiency), substantially under-performing (poor performance) or, in the worst case, both. This paper proposes ACIC, a system which automatically searches for optimized I/O system configurations from many candidates for each individual application running on a given cloud platform. ACIC takes advantage of machine learning models to perform performance/cost predictions. To tackle the high-dimensional parameter exploration space, we enable affordable, reusable, and incremental training on cloud platforms, guided by the Plackett and Burman Matrices for experiment design. Our evaluation results with four representative parallel applications indicate that ACIC consistently identifies optimal or near-optimal configurations among a large group of candidate settings. The top ACIC-recommended configuration is capable of improving the applications' performance by a factor of up to 10.5 (3.1 on average), and cost saving of up to 89 percent (51 percent on average), compared with a commonly used baseline I/O configuration. In addition, we carried out a small-scale user study for one of the test applications, which found that ACIC consistently beat the user and even the application's developer, often by a significant margin, in selecting optimized configurations.

Original languageEnglish
Article number6977978
Pages (from-to)3275-3288
Number of pages14
JournalIEEE Transactions on Parallel and Distributed Systems
Volume26
Issue number12
DOIs
Publication statusPublished - 1 Dec 2015

    Fingerprint

Keywords

  • Cloud Computing
  • Parallel Applications
  • Performance Tool
  • Storage Configuration

ASJC Scopus subject areas

  • Hardware and Architecture
  • Signal Processing
  • Computational Theory and Mathematics

Cite this