Semiparametric and nonparametric methods in data mining and statistical learning with applications in public health surveillance and personalized medicine Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 22, 2019
  • Zhao, Yingqi
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • The field of statistical learning has been growing rapidly over the past few decades, with a diverse range of applications. In this dissertation, we develop methodology mainly using semiparametric and nonparametric statistical learning techniques for the areas of public health surveillance and personalized medicine. Surveillance, providing early warning for impending emergencies, is a key function of public health. In Chapter 2, we propose a semiparametric spatiotemporal method to model spatiotemporal lattice data via a local linear fitting combined with day-of-week effects, in which both spatial and temporal information are taken into account. Detection of abnormal events are carried out using an ARMA time series technique for residuals combined with a resampling approach to determine the threshold for significance. We conduct simulations to assess the performance of the proposed method. Also, the method is illustrated using the data on daily asthma admissions collected through North Carolina emergency departments that occurred between 2006 and 2007. There is increasing interest in personalized medicine: the idea of tailoring treatment for each individual to optimize patient outcome. In Chapter 3, we focus on the single-decision setup. We show that estimating such an optimal treatment rule is equivalent to a classification problem where each subject is weighted proportional to his or her clinical outcome, although the true class labels, to which treatment group the patients belong as the optimal, are unknown in the training set. We then propose a new approach based on the support vector machine framework from computer science. We show the resulting estimator of the treatment rule is consistent, and further derive fairly accurate convergence rates for this estimator. The performance of the proposed approach is demonstrated via simulation studies and an analysis of chronic depression data. It is not uncommon that the best clinical strategies may require adaptation over time. We thus in Chapter 4 generalize the outcome weighted learning method to the multi-decision setup, aiming at finding the dynamic treatment regimes, customized sequential decision rules for individual patients which can adapt over time to the evolving illness, to maximize the long term health outcome. Inspired by the intrinsic idea in dynamic programming, we conduct outcome weighted learning for each stage backwards through time. We further introduce an iterative procedure which can improve the performance of the algorithm. The methods are evaluated by simulation studies and an analysis on a smoking cessation data set.
Date of publication
Resource type
Rights statement
  • In Copyright
  • ... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biostatistics.
  • Kosorok, Michael

This work has no parents.