RUNTIME METHODS TO IMPROVE ENERGY EFFICIENCY IN SUPERCOMPUTING APPLICATIONS Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
Creator
  • Bhalachandra, Sridutt
    • Affiliation: College of Arts and Sciences, Department of Computer Science
Abstract
  • Energy efficiency in supercomputing is critical to limit operating costs and carbon footprints. While the energy efficiency of future supercomputing centers needs to improve at all levels, the energy consumed by the processing units is a large fraction of the total energy consumed by High Performance Computing (HPC) systems. HPC applications use a parallel programming paradigm like the Message Passing Interface (MPI) to coordinate computation and communication among thousands of processors. With dynamically-changing factors both in hardware and software affecting energy usage of processors, there exists a need for power monitoring and regulation at runtime to achieve savings in energy. This dissertation highlights an adaptive runtime framework that enables processors with core-specific power control by dynamically adapting to workload characteristics to reduce power with little or no performance impact. Two opportunities to improve the energy efficiency of processors running MPI applications are identified - computational workload imbalance and waiting on memory. Monitoring of performance and power regulation is performed by the framework transparently within the MPI runtime system, eliminating the need for code changes to MPI applications. The effect of enforcing power limits (capping) on processors is also investigated. Experiments on 32 nodes (1024 cores) show that in presence of workload imbalance, the runtime reduces Central Processing Unit (CPU) frequency on cores not on the critical path, thereby reducing power and hence energy usage without deteriorating performance. Using this runtime, six MPI mini-applications and a full MPI application show an overall 20% decrease in energy use with less than 1% increase in execution time. In addition, the lowering of frequency on non-critical cores reduces run-to-run performance variation and improves performance. For the full application, an average speedup of 11% is seen, while the power is lowered by about 31% for an energy savings of up to 42%. Another experiment on 16 nodes (256 cores) that are power capped also shows performance improvement along with power reduction. Thus, energy optimization can also be a performance optimization. For applications that are limited by memory access times, memory metrics identified facilitate lowering of power by up to 32% without adversely impacting performance.
Date of publication
Keyword
DOI
Resource type
Advisor
  • Fowler, Robert
  • Olivier, Stephen
  • Porterfield, Allan
  • Prins, Jan
  • Singh, Montek
Degree
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2018
Language
Parents:

This work has no parents.

Items