Harnessing Heterogeneity To Improve Patient Outcomes. Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
  • Hibbard, Jonathan
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • We investigate methods of improving medical outcomes through exploiting heterogeneity, with focus on actual implementation. Advances in data-mining and big-data methods have allowed new and exciting opportunities to alter the precise nature of statistical medical research. Whereas traditional science experimentation has attempted to eliminate causes of variability beyond a small set of variables of interest to be investigated, machine-learning techniques to extract weak and complex signals from noisy data now allow handling of heterogeneous experiments and subjects. We propose that viewed through the lens of these modern machine-learning methods, heterogeneous and highly-variable data should be regarded as a boon not a nuisance. In particular such data allows for the investigation and construction of individualized treatment rules for patients, that is for the advance of precision medicine. Two facets of this view are especially explored. Firstly the practical design and implementation of appropriate data collection experiments allowing for a machine-learning approach, whilst simultaneously permitting a traditional experimental view in order to satisfy investigators from both paradigms. We reference a particular example, the design for a clinical trial investigating the optimal treatment of burns patients (the LIBERTI trial). This example highlights some particular challenges, statistical, philosophical and logistical, and hopefully some corresponding solutions, that arise when bridging traditional and modern paradigms. Whilst we present our design as an initial solution, from the attempted implementation of this trial we discover, and then explore, particular aspects that are apt for further improvement. Secondly we investigate methods to combine and make effective traditional clustering techniques in higher dimensional data with weak signals, where existing techniques may fail. Motivated by an example of COPD sufferers’ data (the SPIROMICS study), we attempt to develop ways combining more traditional methods with a machine-learning approach, and more fuzzy data-mining methods, with ones permitting better inference. We illustrate our methods on Fisher's Iris data, and the Wisconsin Breast Cancer data set. We explore extensions of the traditional Gaussian mixture model to more general log-concave distributions and highlight what should be interesting theory for such approximations.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Kosorok, Michael
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2017

This work has no parents.