Automatic, application-aware I/O forwarding resource allocation

Xu Ji, Bin Yang, Tianyu Zhang, Xiaosong Ma, Xiupeng Zhu, Xiyang Wang, Nosayba El-Sayed, Jidong Zhai, Weiguo Liu, Wei Xue

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The I/O forwarding architecture is widely adopted on modern supercomputers, with a layer of intermediate nodes sitting between the many compute nodes and backend storage nodes. This allows compute nodes to run more efficiently and stably with a leaner OS, offloads I/O coordination and communication with backend from the compute nodes, maintains less concurrent connections to storage systems, and provides additional resources for effective caching, prefetching, write buffering, and I/O aggregation. However, with many existing machines, these forwarding nodes are assigned to serve a fixed set of compute nodes. We explore an automatic mechanism, DFRA, for application-adaptive dynamic forwarding resource allocation. We use I/O monitoring data that proves affordable to acquire in real time and maintain for long-term history analysis. Upon each job’s dispatch, DFRA conducts a history-based study to determine whether the job should be granted more forwarding resources or given dedicated forwarding nodes. Such customized I/O forwarding lets the small fraction of I/O-intensive applications achieve higher I/O performance and scalability, meanwhile effectively isolating disruptive I/O activities. We implemented, evaluated, and deployed DFRA on Sunway TaihuLight, the current No.3 supercomputer in the world. It improves applications’ I/O performance by up to 18.9×, eliminates most of the inter-application I/O interference, and has saved over 200 million of core-hours during its test deployment on TaihuLight for 11 months. Finally, our proposed DFRA design is not platform-dependent, making it applicable to the management of existing and future I/O forwarding or burst buffer resources.

Original languageEnglish
Title of host publicationProceedings of the 17th USENIX Conference on File and Storage Technologies, FAST 2019
PublisherUSENIX Association
Pages265-279
Number of pages15
ISBN (Electronic)9781939133090
Publication statusPublished - 1 Jan 2019
Event17th USENIX Conference on File and Storage Technologies, FAST 2019 - Boston, United States
Duration: 25 Feb 201928 Feb 2019

Publication series

NameProceedings of the 17th USENIX Conference on File and Storage Technologies, FAST 2019

Conference

Conference17th USENIX Conference on File and Storage Technologies, FAST 2019
CountryUnited States
CityBoston
Period25/2/1928/2/19

    Fingerprint

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Ji, X., Yang, B., Zhang, T., Ma, X., Zhu, X., Wang, X., El-Sayed, N., Zhai, J., Liu, W., & Xue, W. (2019). Automatic, application-aware I/O forwarding resource allocation. In Proceedings of the 17th USENIX Conference on File and Storage Technologies, FAST 2019 (pp. 265-279). (Proceedings of the 17th USENIX Conference on File and Storage Technologies, FAST 2019). USENIX Association.