PDE Solvers for Hybrid CPU-GPU Architectures

Malahe, Michael

Download PDF

Request Version for Screen Reader

Last Modified

March 20, 2019

Creator

Malahe, Michael
- Affiliation: College of Arts and Sciences, Department of Mathematics

Abstract

Many problems of scientific and industrial interest are investigated through numerically solving partial differential equations (PDEs). For some of these problems, the scope of the investigation is limited by the costs of computational resources. A new approach to reducing these costs is the use of coprocessors, such as graphics processing units (GPUs) and Many Integrated Core (MIC) cards, which can execute floating point operations at a higher rate than a central processing unit (CPU) of the same cost. This is achieved through the use of a large number of processors in a single device, each with very limited dedicated memory per thread. Codes for a number of continuum methods, such as boundary element methods (BEM), finite element methods (FEM) and finite difference methods (FDM) have already been implemented on coprocessor architectures. These methods were designed before the adoption of coprocessor architectures, so implementing them efficiently with reduced thread-level memory can be challenging. There are other methods that do operate efficiently with limited thread-level memory, such as Monte Carlo methods (MCM) and lattice Boltzmann methods (LBM) for kinetic formulations of PDEs, but they are not competitive on CPUs and generally have poorer convergence than the continuum methods. In this work, we introduce a class of methods in which the parallelism of kinetic formulations on GPUs is combined with the better convergence of continuum methods on CPUs. We first extend an existing Feynman-Kac formulation for determining the principal eigenpair of an elliptic operator to create a version that can retrieve arbitrarily many eigenpairs. This new method is implemented for multiple GPUs, and combined with a standard deflation preconditioner on multiple CPUs to create a hybrid concurrent method with superior convergence to that of the deflation preconditioner alone. The hybrid method exhibits good parallelism, with an efficiency of 80% on a problem with 300 million unknowns, run on a configuration of 324 CPU cores and 54 GPUs.

Date of publication

August 2016

Keyword

DOI

https://doi.org/10.17615/m99f-wd41

Resource type

Dissertation

Rights statement

In Copyright

Advisor

Kimbell, Julia
Huang, Jingfang
McLaughlin, Richard
Mitran, Sorin
Griffith, Boyce

Degree

Doctor of Philosophy

Degree granting institution

University of North Carolina at Chapel Hill Graduate School

Graduation year

2016

Language

English

Relations

Parents:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	Malahe_unc_0153D_16486.pdf	2019-04-11	Public	Download

PDE Solvers for Hybrid CPU-GPU Architectures

Downloadable Content

Relations

Items