Case Studies on Optimizing Algorithms for GPU Architectures
Public DepositedAdd to collection
You do not have access to any existing collections. You may create a new collection.
Downloadable Content
Download PDFCitation
MLA
Brown, Shawn. Case Studies On Optimizing Algorithms for Gpu Architectures. Chapel Hill, NC: University of North Carolina at Chapel Hill Graduate School, 2015. https://doi.org/10.17615/jybr-f558APA
Brown, S. (2015). Case Studies on Optimizing Algorithms for GPU Architectures. Chapel Hill, NC: University of North Carolina at Chapel Hill Graduate School. https://doi.org/10.17615/jybr-f558Chicago
Brown, Shawn. 2015. Case Studies On Optimizing Algorithms for Gpu Architectures. Chapel Hill, NC: University of North Carolina at Chapel Hill Graduate School. https://doi.org/10.17615/jybr-f558- Last Modified
- March 19, 2019
- Creator
-
Brown, Shawn
- Affiliation: College of Arts and Sciences, Department of Computer Science
- Abstract
- Modern GPUs are complex, massively multi-threaded, and high-performance. Programmers naturally gravitate towards taking advantage of this high performance for achieving faster results. However, in order to do so successfully, programmers must first understand and then master a new set of skills – writing parallel code, using different types of parallelism, adapting to GPU architectural features, and understanding issues that limit performance. In order to ease this learning process and help GPU programmers become productive more quickly, this dissertation introduces three data access skeletons (DASks) – Block, Column, and Row -- and two block access skeletons (BASks) – Block-By-Block and Warp-by-Warp. Each “skeleton” provides a high-performance implementation framework that partitions data arrays into data blocks and then iterates over those blocks. The programmer must still write “body” methods on individual data blocks to solve their specific problem. These skeletons provide efficient machine dependent data access patterns for use on GPUs. DASks group n data elements into m fixed size data blocks. These m data block are then partitioned across p thread blocks using a 1D or 2D layout pattern. The fixed-size data blocks are parameterized using three C++ template parameters – nWork, WarpSize, and nWarps. Generic programming techniques use these three parameters to enable performance experiments on three different types of parallelism – instruction-level parallelism (ILP), data-level parallelism (DLP), and thread-level parallelism (TLP). These different DASks and BASks are introduced using a simple memory I/O (Copy) case study. A nearest neighbor search case study resulted in the development of DASKs and BASks but does not use these skeletons itself. Three additional case studies – Reduce/Scan, Histogram, and Radix Sort -- demonstrate DASks and BASks in action on parallel primitives and also provides more valuable performance lessons.
- Date of publication
- May 2015
- Keyword
- Subject
- DOI
- Identifier
- Resource type
- Rights statement
- In Copyright
- Advisor
- Prins, Jan
- Manocha, Dinesh
- Snoeyink, Jack
- Nyland, Lars
- Lastra, Anselmo
- Degree
- Doctor of Philosophy
- Degree granting institution
- University of North Carolina at Chapel Hill Graduate School
- Graduation year
- 2015
- Language
- Publisher
- Place of publication
- Chapel Hill, NC
- Access right
- There are no restrictions to this item.
- Date uploaded
- August 25, 2015
Relations
- Parents:
This work has no parents.
Items
Thumbnail | Title | Date Uploaded | Visibility | Actions |
---|---|---|---|---|
Brown_unc_0153D_15479.pdf | 2019-04-10 | Public | Download |