Server-Side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems

Yang Liu, Raghul Gunasekaran, Xiaosong Ma, Sudharshan S. Vazhkudai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

Inter-application I/O contention and performance interference have been recognized as severe problems. In this work, we demonstrate, through measurement from Titan (world's No. 3 supercomputer), that high I/O variance co-exists with the fact that individual storage units remain under-utilized for the majority of the time. This motivates us to propose AID, a system that performs automatic application I/O characterization and I/O-aware job scheduling. AID analyzes existing I/O traffic and batch job history logs, without any prior knowledge on applications or user/developer involvement. It identifies the small set of I/O-intensive candidates among all applications running on a supercomputer and subsequently mines their I/O patterns, using more detailed per-I/O-node traffic logs. Based on such auto-extracted information, AID provides online I/O-aware scheduling recommendations to steer I/O-intensive applications away from heavy ongoing I/O activities. We evaluate AID on Titan, using both real applications (with extracted I/O patterns validated by contacting users) and our own pseudo-applications. Our results confirm that AID is able to (1) identify I/O-intensive applications and their detailed I/O characteristics, and (2) significantly reduce these applications' I/O performance degradation/variance by jointly evaluating outstanding applications' I/O pattern and real-time system l/O load.

Original languageEnglish
Title of host publicationProceedings of SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
Pages819-829
Number of pages11
ISBN (Electronic)9781467388153
DOIs
Publication statusPublished - 13 Mar 2017
Event2016 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016 - Salt Lake City, United States
Duration: 13 Nov 201618 Nov 2016

Other

Other2016 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016
CountryUnited States
CitySalt Lake City
Period13/11/1618/11/16

Fingerprint

Servers
Supercomputers
Scheduling
Real time systems
Degradation

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Cite this

Liu, Y., Gunasekaran, R., Ma, X., & Vazhkudai, S. S. (2017). Server-Side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems. In Proceedings of SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 819-829). [7877148] IEEE Computer Society. https://doi.org/10.1109/SC.2016.69

Server-Side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems. / Liu, Yang; Gunasekaran, Raghul; Ma, Xiaosong; Vazhkudai, Sudharshan S.

Proceedings of SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 2017. p. 819-829 7877148.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liu, Y, Gunasekaran, R, Ma, X & Vazhkudai, SS 2017, Server-Side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems. in Proceedings of SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis., 7877148, IEEE Computer Society, pp. 819-829, 2016 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, Salt Lake City, United States, 13/11/16. https://doi.org/10.1109/SC.2016.69
Liu Y, Gunasekaran R, Ma X, Vazhkudai SS. Server-Side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems. In Proceedings of SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society. 2017. p. 819-829. 7877148 https://doi.org/10.1109/SC.2016.69
Liu, Yang ; Gunasekaran, Raghul ; Ma, Xiaosong ; Vazhkudai, Sudharshan S. / Server-Side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems. Proceedings of SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 2017. pp. 819-829
@inproceedings{e7df41cf6fa8479c9ce65af152bc59e7,
title = "Server-Side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems",
abstract = "Inter-application I/O contention and performance interference have been recognized as severe problems. In this work, we demonstrate, through measurement from Titan (world's No. 3 supercomputer), that high I/O variance co-exists with the fact that individual storage units remain under-utilized for the majority of the time. This motivates us to propose AID, a system that performs automatic application I/O characterization and I/O-aware job scheduling. AID analyzes existing I/O traffic and batch job history logs, without any prior knowledge on applications or user/developer involvement. It identifies the small set of I/O-intensive candidates among all applications running on a supercomputer and subsequently mines their I/O patterns, using more detailed per-I/O-node traffic logs. Based on such auto-extracted information, AID provides online I/O-aware scheduling recommendations to steer I/O-intensive applications away from heavy ongoing I/O activities. We evaluate AID on Titan, using both real applications (with extracted I/O patterns validated by contacting users) and our own pseudo-applications. Our results confirm that AID is able to (1) identify I/O-intensive applications and their detailed I/O characteristics, and (2) significantly reduce these applications' I/O performance degradation/variance by jointly evaluating outstanding applications' I/O pattern and real-time system l/O load.",
author = "Yang Liu and Raghul Gunasekaran and Xiaosong Ma and Vazhkudai, {Sudharshan S.}",
year = "2017",
month = "3",
day = "13",
doi = "10.1109/SC.2016.69",
language = "English",
pages = "819--829",
booktitle = "Proceedings of SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Server-Side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems

AU - Liu, Yang

AU - Gunasekaran, Raghul

AU - Ma, Xiaosong

AU - Vazhkudai, Sudharshan S.

PY - 2017/3/13

Y1 - 2017/3/13

N2 - Inter-application I/O contention and performance interference have been recognized as severe problems. In this work, we demonstrate, through measurement from Titan (world's No. 3 supercomputer), that high I/O variance co-exists with the fact that individual storage units remain under-utilized for the majority of the time. This motivates us to propose AID, a system that performs automatic application I/O characterization and I/O-aware job scheduling. AID analyzes existing I/O traffic and batch job history logs, without any prior knowledge on applications or user/developer involvement. It identifies the small set of I/O-intensive candidates among all applications running on a supercomputer and subsequently mines their I/O patterns, using more detailed per-I/O-node traffic logs. Based on such auto-extracted information, AID provides online I/O-aware scheduling recommendations to steer I/O-intensive applications away from heavy ongoing I/O activities. We evaluate AID on Titan, using both real applications (with extracted I/O patterns validated by contacting users) and our own pseudo-applications. Our results confirm that AID is able to (1) identify I/O-intensive applications and their detailed I/O characteristics, and (2) significantly reduce these applications' I/O performance degradation/variance by jointly evaluating outstanding applications' I/O pattern and real-time system l/O load.

AB - Inter-application I/O contention and performance interference have been recognized as severe problems. In this work, we demonstrate, through measurement from Titan (world's No. 3 supercomputer), that high I/O variance co-exists with the fact that individual storage units remain under-utilized for the majority of the time. This motivates us to propose AID, a system that performs automatic application I/O characterization and I/O-aware job scheduling. AID analyzes existing I/O traffic and batch job history logs, without any prior knowledge on applications or user/developer involvement. It identifies the small set of I/O-intensive candidates among all applications running on a supercomputer and subsequently mines their I/O patterns, using more detailed per-I/O-node traffic logs. Based on such auto-extracted information, AID provides online I/O-aware scheduling recommendations to steer I/O-intensive applications away from heavy ongoing I/O activities. We evaluate AID on Titan, using both real applications (with extracted I/O patterns validated by contacting users) and our own pseudo-applications. Our results confirm that AID is able to (1) identify I/O-intensive applications and their detailed I/O characteristics, and (2) significantly reduce these applications' I/O performance degradation/variance by jointly evaluating outstanding applications' I/O pattern and real-time system l/O load.

UR - http://www.scopus.com/inward/record.url?scp=85017235475&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85017235475&partnerID=8YFLogxK

U2 - 10.1109/SC.2016.69

DO - 10.1109/SC.2016.69

M3 - Conference contribution

SP - 819

EP - 829

BT - Proceedings of SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis

PB - IEEE Computer Society

ER -