Functional partitioning to optimize end-to-end performance on many-core architectures

Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim, Christian Engelmann, Galen Shipman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

41 Citations (Scopus)

Abstract

Scaling computations on emerging massive-core super-computers is a daunting task, which coupled with the significantly lagging system I/O capabilities exacerbates applications' end-to-end performance. The I/O bottleneck often negates potential performance benefits of assigning additional compute cores to an application. In this paper, we address this issue via a novel functional partitioning (FP) runtime environment that allocates cores to specific application tasks - checkpointing, de-duplication, and scientific data format transformation - so that the deluge of cores can be brought to bear on the entire gamut of application activities. The focus is on utilizing the extra cores to support HPC application I/O activities and also leverage solid-state disks in this context. For example, our evaluation shows that dedicating 1 core on an oct-core machine for checkpointing and its assist tasks using FP can improve overall execution time of a FLASH benchmark on 80 and 160 cores by 43.95% and 41.34%, respectively.

Original languageEnglish
Title of host publication2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010
DOIs
Publication statusPublished - 1 Dec 2010
Externally publishedYes
Event2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010 - New Orleans, LA, United States
Duration: 13 Nov 201019 Nov 2010

Other

Other2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010
CountryUnited States
CityNew Orleans, LA
Period13/11/1019/11/10

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture

Cite this

Li, M., Vazhkudai, S. S., Butt, A. R., Meng, F., Ma, X., Kim, Y., ... Shipman, G. (2010). Functional partitioning to optimize end-to-end performance on many-core architectures. In 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010 [5644890] https://doi.org/10.1109/SC.2010.28

Functional partitioning to optimize end-to-end performance on many-core architectures. / Li, Min; Vazhkudai, Sudharshan S.; Butt, Ali R.; Meng, Fei; Ma, Xiaosong; Kim, Youngjae; Engelmann, Christian; Shipman, Galen.

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010. 2010. 5644890.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Li, M, Vazhkudai, SS, Butt, AR, Meng, F, Ma, X, Kim, Y, Engelmann, C & Shipman, G 2010, Functional partitioning to optimize end-to-end performance on many-core architectures. in 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010., 5644890, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010, New Orleans, LA, United States, 13/11/10. https://doi.org/10.1109/SC.2010.28
Li M, Vazhkudai SS, Butt AR, Meng F, Ma X, Kim Y et al. Functional partitioning to optimize end-to-end performance on many-core architectures. In 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010. 2010. 5644890 https://doi.org/10.1109/SC.2010.28
Li, Min ; Vazhkudai, Sudharshan S. ; Butt, Ali R. ; Meng, Fei ; Ma, Xiaosong ; Kim, Youngjae ; Engelmann, Christian ; Shipman, Galen. / Functional partitioning to optimize end-to-end performance on many-core architectures. 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010. 2010.
@inproceedings{fbacb994ba2246fbb5f2f45dc90ba4a3,
title = "Functional partitioning to optimize end-to-end performance on many-core architectures",
abstract = "Scaling computations on emerging massive-core super-computers is a daunting task, which coupled with the significantly lagging system I/O capabilities exacerbates applications' end-to-end performance. The I/O bottleneck often negates potential performance benefits of assigning additional compute cores to an application. In this paper, we address this issue via a novel functional partitioning (FP) runtime environment that allocates cores to specific application tasks - checkpointing, de-duplication, and scientific data format transformation - so that the deluge of cores can be brought to bear on the entire gamut of application activities. The focus is on utilizing the extra cores to support HPC application I/O activities and also leverage solid-state disks in this context. For example, our evaluation shows that dedicating 1 core on an oct-core machine for checkpointing and its assist tasks using FP can improve overall execution time of a FLASH benchmark on 80 and 160 cores by 43.95{\%} and 41.34{\%}, respectively.",
author = "Min Li and Vazhkudai, {Sudharshan S.} and Butt, {Ali R.} and Fei Meng and Xiaosong Ma and Youngjae Kim and Christian Engelmann and Galen Shipman",
year = "2010",
month = "12",
day = "1",
doi = "10.1109/SC.2010.28",
language = "English",
isbn = "9781424475575",
booktitle = "2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010",

}

TY - GEN

T1 - Functional partitioning to optimize end-to-end performance on many-core architectures

AU - Li, Min

AU - Vazhkudai, Sudharshan S.

AU - Butt, Ali R.

AU - Meng, Fei

AU - Ma, Xiaosong

AU - Kim, Youngjae

AU - Engelmann, Christian

AU - Shipman, Galen

PY - 2010/12/1

Y1 - 2010/12/1

N2 - Scaling computations on emerging massive-core super-computers is a daunting task, which coupled with the significantly lagging system I/O capabilities exacerbates applications' end-to-end performance. The I/O bottleneck often negates potential performance benefits of assigning additional compute cores to an application. In this paper, we address this issue via a novel functional partitioning (FP) runtime environment that allocates cores to specific application tasks - checkpointing, de-duplication, and scientific data format transformation - so that the deluge of cores can be brought to bear on the entire gamut of application activities. The focus is on utilizing the extra cores to support HPC application I/O activities and also leverage solid-state disks in this context. For example, our evaluation shows that dedicating 1 core on an oct-core machine for checkpointing and its assist tasks using FP can improve overall execution time of a FLASH benchmark on 80 and 160 cores by 43.95% and 41.34%, respectively.

AB - Scaling computations on emerging massive-core super-computers is a daunting task, which coupled with the significantly lagging system I/O capabilities exacerbates applications' end-to-end performance. The I/O bottleneck often negates potential performance benefits of assigning additional compute cores to an application. In this paper, we address this issue via a novel functional partitioning (FP) runtime environment that allocates cores to specific application tasks - checkpointing, de-duplication, and scientific data format transformation - so that the deluge of cores can be brought to bear on the entire gamut of application activities. The focus is on utilizing the extra cores to support HPC application I/O activities and also leverage solid-state disks in this context. For example, our evaluation shows that dedicating 1 core on an oct-core machine for checkpointing and its assist tasks using FP can improve overall execution time of a FLASH benchmark on 80 and 160 cores by 43.95% and 41.34%, respectively.

UR - http://www.scopus.com/inward/record.url?scp=78650851238&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650851238&partnerID=8YFLogxK

U2 - 10.1109/SC.2010.28

DO - 10.1109/SC.2010.28

M3 - Conference contribution

SN - 9781424475575

BT - 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010

ER -