Case Studies on Optimizing Algorithms for GPU Architectures

Brown, Shawn

Download PDF

Request Version for Screen Reader

Last Modified

March 19, 2019

Creator

Brown, Shawn
- Affiliation: College of Arts and Sciences, Department of Computer Science

Abstract

Modern GPUs are complex, massively multi-threaded, and high-performance. Programmers naturally gravitate towards taking advantage of this high performance for achieving faster results. However, in order to do so successfully, programmers must first understand and then master a new set of skills – writing parallel code, using different types of parallelism, adapting to GPU architectural features, and understanding issues that limit performance. In order to ease this learning process and help GPU programmers become productive more quickly, this dissertation introduces three data access skeletons (DASks) – Block, Column, and Row -- and two block access skeletons (BASks) – Block-By-Block and Warp-by-Warp. Each “skeleton” provides a high-performance implementation framework that partitions data arrays into data blocks and then iterates over those blocks. The programmer must still write “body” methods on individual data blocks to solve their specific problem. These skeletons provide efficient machine dependent data access patterns for use on GPUs. DASks group n data elements into m fixed size data blocks. These m data block are then partitioned across p thread blocks using a 1D or 2D layout pattern. The fixed-size data blocks are parameterized using three C++ template parameters – nWork, WarpSize, and nWarps. Generic programming techniques use these three parameters to enable performance experiments on three different types of parallelism – instruction-level parallelism (ILP), data-level parallelism (DLP), and thread-level parallelism (TLP). These different DASks and BASks are introduced using a simple memory I/O (Copy) case study. A nearest neighbor search case study resulted in the development of DASKs and BASks but does not use these skeletons itself. Three additional case studies – Reduce/Scan, Histogram, and Radix Sort -- demonstrate DASks and BASks in action on parallel primitives and also provides more valuable performance lessons.

Date of publication

May 2015

Keyword

Subject

Computer science

DOI

https://doi.org/10.17615/jybr-f558

Identifier

Brown_unc_0153D_15479.pdf

Resource type

Dissertation

Rights statement

In Copyright

Advisor

Prins, Jan
Manocha, Dinesh
Snoeyink, Jack
Nyland, Lars
Lastra, Anselmo

Degree

Doctor of Philosophy

Degree granting institution

University of North Carolina at Chapel Hill Graduate School

Graduation year

2015

Language

English

Publisher

University of North Carolina at Chapel Hill Graduate School

Place of publication

Chapel Hill, NC

Access right

There are no restrictions to this item.

Date uploaded

August 25, 2015

Relations

Parents:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	Brown_unc_0153D_15479.pdf	2019-04-10	Public	Download

Case Studies on Optimizing Algorithms for GPU Architectures

Downloadable Content

Relations

Items