Dynamic sharing of GPUs in cloud systems

Khaled M. Diab, M. Mustafa Rafique, Mohamed Hefeeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

The use of computational accelerators, specifically programmable GPUs, is becoming popular in cloud computing environments. Cloud vendors currently provide GPUs as dedicated resources to cloud users, which may result in under-utilization of the expensive GPU resources. In this work, we propose gCloud, a framework to provide GPUs as on-demand computing resources to cloud users. gCloud allows on-demand access to local and remote GPUs to cloud users only when the target GPU kernel is ready for execution. In order to improve the utilization of GPUs, gCloud efficiently shares the GPU resources among concurrent applications from different cloud users. Moreover, it reduces the inter-Application interference of concurrent kernels for GPU resources by considering the local and global memory, number of threads, and the number of thread blocks of each kernel. It schedules concurrent kernels on available GPUs such that the overall inter-Application interference across the cluster is minimal. We implemented gCloud as an independent module, and integrated it with the Open Stack cloud computing platform. Evaluation of gCloud using representative applications shows that it improves the utilization of GPU resources by 56.3% on average compared to the current state-of-The-Art systems that serialize GPU kernel executions. Moreover, gCloud significantly reduces the completion time of GPU applications, e.g., in our experiments of running a mix of 8 to 28 GPU applications on 4 NVIDIA Tesla GPUs, gCloud achieves up to 430% reduction in the total completion time.

Original languageEnglish
Title of host publicationProceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013
PublisherIEEE Computer Society
Pages947-954
Number of pages8
DOIs
Publication statusPublished - 1 Jan 2013
Event2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013 - Boston, MA, United States
Duration: 20 May 201324 May 2013

Other

Other2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013
CountryUnited States
CityBoston, MA
Period20/5/1324/5/13

Fingerprint

Sharing
Resources
kernel
Concurrent
Cloud Computing
Thread
Interference
Total Completion Time
Completion Time
Accelerator
Graphics processing unit
Cloud computing
Schedule
Module
Target
Computing
Evaluation
Experiment
Particle accelerators
Demand

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software
  • Theoretical Computer Science

Cite this

Diab, K. M., Rafique, M. M., & Hefeeda, M. (2013). Dynamic sharing of GPUs in cloud systems. In Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013 (pp. 947-954). [6650978] IEEE Computer Society. https://doi.org/10.1109/IPDPSW.2013.102

Dynamic sharing of GPUs in cloud systems. / Diab, Khaled M.; Rafique, M. Mustafa; Hefeeda, Mohamed.

Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013. IEEE Computer Society, 2013. p. 947-954 6650978.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Diab, KM, Rafique, MM & Hefeeda, M 2013, Dynamic sharing of GPUs in cloud systems. in Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013., 6650978, IEEE Computer Society, pp. 947-954, 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013, Boston, MA, United States, 20/5/13. https://doi.org/10.1109/IPDPSW.2013.102
Diab KM, Rafique MM, Hefeeda M. Dynamic sharing of GPUs in cloud systems. In Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013. IEEE Computer Society. 2013. p. 947-954. 6650978 https://doi.org/10.1109/IPDPSW.2013.102
Diab, Khaled M. ; Rafique, M. Mustafa ; Hefeeda, Mohamed. / Dynamic sharing of GPUs in cloud systems. Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013. IEEE Computer Society, 2013. pp. 947-954
@inproceedings{6b09121ab32747f18946d9fdea962856,
title = "Dynamic sharing of GPUs in cloud systems",
abstract = "The use of computational accelerators, specifically programmable GPUs, is becoming popular in cloud computing environments. Cloud vendors currently provide GPUs as dedicated resources to cloud users, which may result in under-utilization of the expensive GPU resources. In this work, we propose gCloud, a framework to provide GPUs as on-demand computing resources to cloud users. gCloud allows on-demand access to local and remote GPUs to cloud users only when the target GPU kernel is ready for execution. In order to improve the utilization of GPUs, gCloud efficiently shares the GPU resources among concurrent applications from different cloud users. Moreover, it reduces the inter-Application interference of concurrent kernels for GPU resources by considering the local and global memory, number of threads, and the number of thread blocks of each kernel. It schedules concurrent kernels on available GPUs such that the overall inter-Application interference across the cluster is minimal. We implemented gCloud as an independent module, and integrated it with the Open Stack cloud computing platform. Evaluation of gCloud using representative applications shows that it improves the utilization of GPU resources by 56.3{\%} on average compared to the current state-of-The-Art systems that serialize GPU kernel executions. Moreover, gCloud significantly reduces the completion time of GPU applications, e.g., in our experiments of running a mix of 8 to 28 GPU applications on 4 NVIDIA Tesla GPUs, gCloud achieves up to 430{\%} reduction in the total completion time.",
author = "Diab, {Khaled M.} and Rafique, {M. Mustafa} and Mohamed Hefeeda",
year = "2013",
month = "1",
day = "1",
doi = "10.1109/IPDPSW.2013.102",
language = "English",
pages = "947--954",
booktitle = "Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Dynamic sharing of GPUs in cloud systems

AU - Diab, Khaled M.

AU - Rafique, M. Mustafa

AU - Hefeeda, Mohamed

PY - 2013/1/1

Y1 - 2013/1/1

N2 - The use of computational accelerators, specifically programmable GPUs, is becoming popular in cloud computing environments. Cloud vendors currently provide GPUs as dedicated resources to cloud users, which may result in under-utilization of the expensive GPU resources. In this work, we propose gCloud, a framework to provide GPUs as on-demand computing resources to cloud users. gCloud allows on-demand access to local and remote GPUs to cloud users only when the target GPU kernel is ready for execution. In order to improve the utilization of GPUs, gCloud efficiently shares the GPU resources among concurrent applications from different cloud users. Moreover, it reduces the inter-Application interference of concurrent kernels for GPU resources by considering the local and global memory, number of threads, and the number of thread blocks of each kernel. It schedules concurrent kernels on available GPUs such that the overall inter-Application interference across the cluster is minimal. We implemented gCloud as an independent module, and integrated it with the Open Stack cloud computing platform. Evaluation of gCloud using representative applications shows that it improves the utilization of GPU resources by 56.3% on average compared to the current state-of-The-Art systems that serialize GPU kernel executions. Moreover, gCloud significantly reduces the completion time of GPU applications, e.g., in our experiments of running a mix of 8 to 28 GPU applications on 4 NVIDIA Tesla GPUs, gCloud achieves up to 430% reduction in the total completion time.

AB - The use of computational accelerators, specifically programmable GPUs, is becoming popular in cloud computing environments. Cloud vendors currently provide GPUs as dedicated resources to cloud users, which may result in under-utilization of the expensive GPU resources. In this work, we propose gCloud, a framework to provide GPUs as on-demand computing resources to cloud users. gCloud allows on-demand access to local and remote GPUs to cloud users only when the target GPU kernel is ready for execution. In order to improve the utilization of GPUs, gCloud efficiently shares the GPU resources among concurrent applications from different cloud users. Moreover, it reduces the inter-Application interference of concurrent kernels for GPU resources by considering the local and global memory, number of threads, and the number of thread blocks of each kernel. It schedules concurrent kernels on available GPUs such that the overall inter-Application interference across the cluster is minimal. We implemented gCloud as an independent module, and integrated it with the Open Stack cloud computing platform. Evaluation of gCloud using representative applications shows that it improves the utilization of GPU resources by 56.3% on average compared to the current state-of-The-Art systems that serialize GPU kernel executions. Moreover, gCloud significantly reduces the completion time of GPU applications, e.g., in our experiments of running a mix of 8 to 28 GPU applications on 4 NVIDIA Tesla GPUs, gCloud achieves up to 430% reduction in the total completion time.

UR - http://www.scopus.com/inward/record.url?scp=84899727151&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84899727151&partnerID=8YFLogxK

U2 - 10.1109/IPDPSW.2013.102

DO - 10.1109/IPDPSW.2013.102

M3 - Conference contribution

AN - SCOPUS:84899727151

SP - 947

EP - 954

BT - Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013

PB - IEEE Computer Society

ER -