A new approach to load balance for parallel/compositional simulation based on reservoir-model overdecomposition

Yuhe Wang, John E. Killough

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

The quest for efficient and scalable parallel reservoir simulators has been evolving with the advancement of high-performance computing architectures. Among the various challenges of efficiency and scalability, load imbalance is a major obstacle that has not been fully addressed and solved. The causes of load imbalance in parallel reservoir simulation are both static and dynamic. Robust graph-partitioning algorithms are capable of handling static load imbalance by decomposing the underlying reservoir geometry to distribute a roughly equal load to each processor. However, these loads that are determined by a static load balancer seldom remain unchanged as the simulation proceeds in time. This socalled dynamic imbalance can be exacerbated further in parallel compositional simulations. The flash calculations for equations of state (EOSs) in complex compositional simulations not only can consume more than half of the total execution time but also are difficult to balance merely by a static load balancer. The computational cost of flash calculations in each gridblock heavily depends on the dynamic data such as pressure, temperature, and hydrocarbon composition. Thus, any static assignment of gridblocks may lead to dynamic load imbalance in unpredictable manners. A dynamic load balancer can often provide solutions for this difficulty. However, traditional techniques are inflexible and tedious to implement in legacy reservoir simulators. In this paper, we present a new approach to address dynamic load imbalance in parallel compositional simulation. It overdecomposes the reservoir model to assign each processor a bundle of subdomains. Processors treat these bundles of subdomains as virtual processes or userlevel migratable threads that can be dynamically migrated across processors in the run-time system. This technique is shown to be capable of achieving better overlap between computation and communication for cache efficiency. We use this approach in a legacy reservoir simulator and demonstrate a reduction in the execution time of parallel compositional simulations while requiring minimal changes to the source code. Finally, it is shown that domain overdecomposition, together with a load balancer, can improve speedup from 29.27 to 62.38 on 64 physical processors for a realistic simulation problem.

Original languageEnglish
Pages (from-to)304-315
Number of pages12
JournalSPE Journal
Volume19
Issue number2
Publication statusPublished - 2014

Fingerprint

Dynamic loads
Simulators
simulation
simulator
Equations of state
Scalability
Hydrocarbons
Geometry
Communication
Chemical analysis
Costs
equation of state
partitioning
Temperature
communication
hydrocarbon
geometry
cost
temperature

ASJC Scopus subject areas

  • Geotechnical Engineering and Engineering Geology
  • Energy Engineering and Power Technology

Cite this

A new approach to load balance for parallel/compositional simulation based on reservoir-model overdecomposition. / Wang, Yuhe; Killough, John E.

In: SPE Journal, Vol. 19, No. 2, 2014, p. 304-315.

Research output: Contribution to journalArticle

@article{1a93a113d18a400a8a539135c368aa21,
title = "A new approach to load balance for parallel/compositional simulation based on reservoir-model overdecomposition",
abstract = "The quest for efficient and scalable parallel reservoir simulators has been evolving with the advancement of high-performance computing architectures. Among the various challenges of efficiency and scalability, load imbalance is a major obstacle that has not been fully addressed and solved. The causes of load imbalance in parallel reservoir simulation are both static and dynamic. Robust graph-partitioning algorithms are capable of handling static load imbalance by decomposing the underlying reservoir geometry to distribute a roughly equal load to each processor. However, these loads that are determined by a static load balancer seldom remain unchanged as the simulation proceeds in time. This socalled dynamic imbalance can be exacerbated further in parallel compositional simulations. The flash calculations for equations of state (EOSs) in complex compositional simulations not only can consume more than half of the total execution time but also are difficult to balance merely by a static load balancer. The computational cost of flash calculations in each gridblock heavily depends on the dynamic data such as pressure, temperature, and hydrocarbon composition. Thus, any static assignment of gridblocks may lead to dynamic load imbalance in unpredictable manners. A dynamic load balancer can often provide solutions for this difficulty. However, traditional techniques are inflexible and tedious to implement in legacy reservoir simulators. In this paper, we present a new approach to address dynamic load imbalance in parallel compositional simulation. It overdecomposes the reservoir model to assign each processor a bundle of subdomains. Processors treat these bundles of subdomains as virtual processes or userlevel migratable threads that can be dynamically migrated across processors in the run-time system. This technique is shown to be capable of achieving better overlap between computation and communication for cache efficiency. We use this approach in a legacy reservoir simulator and demonstrate a reduction in the execution time of parallel compositional simulations while requiring minimal changes to the source code. Finally, it is shown that domain overdecomposition, together with a load balancer, can improve speedup from 29.27 to 62.38 on 64 physical processors for a realistic simulation problem.",
author = "Yuhe Wang and Killough, {John E.}",
year = "2014",
language = "English",
volume = "19",
pages = "304--315",
journal = "SPE Journal",
issn = "1086-055X",
publisher = "Society of Petroleum Engineers (SPE)",
number = "2",

}

TY - JOUR

T1 - A new approach to load balance for parallel/compositional simulation based on reservoir-model overdecomposition

AU - Wang, Yuhe

AU - Killough, John E.

PY - 2014

Y1 - 2014

N2 - The quest for efficient and scalable parallel reservoir simulators has been evolving with the advancement of high-performance computing architectures. Among the various challenges of efficiency and scalability, load imbalance is a major obstacle that has not been fully addressed and solved. The causes of load imbalance in parallel reservoir simulation are both static and dynamic. Robust graph-partitioning algorithms are capable of handling static load imbalance by decomposing the underlying reservoir geometry to distribute a roughly equal load to each processor. However, these loads that are determined by a static load balancer seldom remain unchanged as the simulation proceeds in time. This socalled dynamic imbalance can be exacerbated further in parallel compositional simulations. The flash calculations for equations of state (EOSs) in complex compositional simulations not only can consume more than half of the total execution time but also are difficult to balance merely by a static load balancer. The computational cost of flash calculations in each gridblock heavily depends on the dynamic data such as pressure, temperature, and hydrocarbon composition. Thus, any static assignment of gridblocks may lead to dynamic load imbalance in unpredictable manners. A dynamic load balancer can often provide solutions for this difficulty. However, traditional techniques are inflexible and tedious to implement in legacy reservoir simulators. In this paper, we present a new approach to address dynamic load imbalance in parallel compositional simulation. It overdecomposes the reservoir model to assign each processor a bundle of subdomains. Processors treat these bundles of subdomains as virtual processes or userlevel migratable threads that can be dynamically migrated across processors in the run-time system. This technique is shown to be capable of achieving better overlap between computation and communication for cache efficiency. We use this approach in a legacy reservoir simulator and demonstrate a reduction in the execution time of parallel compositional simulations while requiring minimal changes to the source code. Finally, it is shown that domain overdecomposition, together with a load balancer, can improve speedup from 29.27 to 62.38 on 64 physical processors for a realistic simulation problem.

AB - The quest for efficient and scalable parallel reservoir simulators has been evolving with the advancement of high-performance computing architectures. Among the various challenges of efficiency and scalability, load imbalance is a major obstacle that has not been fully addressed and solved. The causes of load imbalance in parallel reservoir simulation are both static and dynamic. Robust graph-partitioning algorithms are capable of handling static load imbalance by decomposing the underlying reservoir geometry to distribute a roughly equal load to each processor. However, these loads that are determined by a static load balancer seldom remain unchanged as the simulation proceeds in time. This socalled dynamic imbalance can be exacerbated further in parallel compositional simulations. The flash calculations for equations of state (EOSs) in complex compositional simulations not only can consume more than half of the total execution time but also are difficult to balance merely by a static load balancer. The computational cost of flash calculations in each gridblock heavily depends on the dynamic data such as pressure, temperature, and hydrocarbon composition. Thus, any static assignment of gridblocks may lead to dynamic load imbalance in unpredictable manners. A dynamic load balancer can often provide solutions for this difficulty. However, traditional techniques are inflexible and tedious to implement in legacy reservoir simulators. In this paper, we present a new approach to address dynamic load imbalance in parallel compositional simulation. It overdecomposes the reservoir model to assign each processor a bundle of subdomains. Processors treat these bundles of subdomains as virtual processes or userlevel migratable threads that can be dynamically migrated across processors in the run-time system. This technique is shown to be capable of achieving better overlap between computation and communication for cache efficiency. We use this approach in a legacy reservoir simulator and demonstrate a reduction in the execution time of parallel compositional simulations while requiring minimal changes to the source code. Finally, it is shown that domain overdecomposition, together with a load balancer, can improve speedup from 29.27 to 62.38 on 64 physical processors for a realistic simulation problem.

UR - http://www.scopus.com/inward/record.url?scp=84901228694&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84901228694&partnerID=8YFLogxK

M3 - Article

VL - 19

SP - 304

EP - 315

JO - SPE Journal

JF - SPE Journal

SN - 1086-055X

IS - 2

ER -