Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems

Dheevatsa Mudigere, Srinivas Sridharan, Anand Deshpande, Jongsoo Park, Alexander Heinecke, Mikhail Smelyanskiy, Bharat Kaul, Pradeep Dubey, Dinesh Kaushik, David Keyes

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

In this work, we revisit the 1999 Gordon Bell Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory parallelization and detailed performance analysis on modern highly parallel architectures. An unstructured-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain decomposition approach, exposes tradeoffs between the number of threads assigned to each MPI-rank sub domain, and the total number of domains. By applying several algorithm-and architecture-aware optimization techniques for unstructured grids, we show a 6.9X speed-up in performance on a single-node Intel® Xeon™1 E52690v2 processor relative to the out-of-the-box compilation. Our scaling studies on TACC Stampede supercomputer show that our optimizations continue to provide performance benefits over baseline implementation as we scale up to 256 nodes.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages723-732
Number of pages10
ISBN (Electronic)9781479986484
DOIs
Publication statusPublished - 17 Jul 2015
Event29th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2015 - Hyderabad, India
Duration: 25 May 201529 May 2015

Other

Other29th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2015
CountryIndia
CityHyderabad
Period25/5/1529/5/15

Fingerprint

Computational fluid dynamics
Data storage equipment
Parallel architectures
Supercomputers
Aerodynamics
Decomposition

Keywords

  • CFD
  • Krylov Solver
  • Multi-core
  • OpenMP+MPI

ASJC Scopus subject areas

  • Computer Networks and Communications

Cite this

Mudigere, D., Sridharan, S., Deshpande, A., Park, J., Heinecke, A., Smelyanskiy, M., ... Keyes, D. (2015). Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems. In Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015 (pp. 723-732). [7161559] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IPDPS.2015.114

Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems. / Mudigere, Dheevatsa; Sridharan, Srinivas; Deshpande, Anand; Park, Jongsoo; Heinecke, Alexander; Smelyanskiy, Mikhail; Kaul, Bharat; Dubey, Pradeep; Kaushik, Dinesh; Keyes, David.

Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 723-732 7161559.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mudigere, D, Sridharan, S, Deshpande, A, Park, J, Heinecke, A, Smelyanskiy, M, Kaul, B, Dubey, P, Kaushik, D & Keyes, D 2015, Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems. in Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015., 7161559, Institute of Electrical and Electronics Engineers Inc., pp. 723-732, 29th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2015, Hyderabad, India, 25/5/15. https://doi.org/10.1109/IPDPS.2015.114
Mudigere D, Sridharan S, Deshpande A, Park J, Heinecke A, Smelyanskiy M et al. Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems. In Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 723-732. 7161559 https://doi.org/10.1109/IPDPS.2015.114
Mudigere, Dheevatsa ; Sridharan, Srinivas ; Deshpande, Anand ; Park, Jongsoo ; Heinecke, Alexander ; Smelyanskiy, Mikhail ; Kaul, Bharat ; Dubey, Pradeep ; Kaushik, Dinesh ; Keyes, David. / Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems. Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 723-732
@inproceedings{3e16302ccbfe40f680dc0b2b68b3586a,
title = "Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems",
abstract = "In this work, we revisit the 1999 Gordon Bell Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory parallelization and detailed performance analysis on modern highly parallel architectures. An unstructured-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain decomposition approach, exposes tradeoffs between the number of threads assigned to each MPI-rank sub domain, and the total number of domains. By applying several algorithm-and architecture-aware optimization techniques for unstructured grids, we show a 6.9X speed-up in performance on a single-node Intel{\circledR} Xeon™1 E52690v2 processor relative to the out-of-the-box compilation. Our scaling studies on TACC Stampede supercomputer show that our optimizations continue to provide performance benefits over baseline implementation as we scale up to 256 nodes.",
keywords = "CFD, Krylov Solver, Multi-core, OpenMP+MPI",
author = "Dheevatsa Mudigere and Srinivas Sridharan and Anand Deshpande and Jongsoo Park and Alexander Heinecke and Mikhail Smelyanskiy and Bharat Kaul and Pradeep Dubey and Dinesh Kaushik and David Keyes",
year = "2015",
month = "7",
day = "17",
doi = "10.1109/IPDPS.2015.114",
language = "English",
pages = "723--732",
booktitle = "Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems

AU - Mudigere, Dheevatsa

AU - Sridharan, Srinivas

AU - Deshpande, Anand

AU - Park, Jongsoo

AU - Heinecke, Alexander

AU - Smelyanskiy, Mikhail

AU - Kaul, Bharat

AU - Dubey, Pradeep

AU - Kaushik, Dinesh

AU - Keyes, David

PY - 2015/7/17

Y1 - 2015/7/17

N2 - In this work, we revisit the 1999 Gordon Bell Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory parallelization and detailed performance analysis on modern highly parallel architectures. An unstructured-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain decomposition approach, exposes tradeoffs between the number of threads assigned to each MPI-rank sub domain, and the total number of domains. By applying several algorithm-and architecture-aware optimization techniques for unstructured grids, we show a 6.9X speed-up in performance on a single-node Intel® Xeon™1 E52690v2 processor relative to the out-of-the-box compilation. Our scaling studies on TACC Stampede supercomputer show that our optimizations continue to provide performance benefits over baseline implementation as we scale up to 256 nodes.

AB - In this work, we revisit the 1999 Gordon Bell Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory parallelization and detailed performance analysis on modern highly parallel architectures. An unstructured-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain decomposition approach, exposes tradeoffs between the number of threads assigned to each MPI-rank sub domain, and the total number of domains. By applying several algorithm-and architecture-aware optimization techniques for unstructured grids, we show a 6.9X speed-up in performance on a single-node Intel® Xeon™1 E52690v2 processor relative to the out-of-the-box compilation. Our scaling studies on TACC Stampede supercomputer show that our optimizations continue to provide performance benefits over baseline implementation as we scale up to 256 nodes.

KW - CFD

KW - Krylov Solver

KW - Multi-core

KW - OpenMP+MPI

UR - http://www.scopus.com/inward/record.url?scp=84971375871&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84971375871&partnerID=8YFLogxK

U2 - 10.1109/IPDPS.2015.114

DO - 10.1109/IPDPS.2015.114

M3 - Conference contribution

AN - SCOPUS:84971375871

SP - 723

EP - 732

BT - Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -