Applications of Independence Statistics to Goodness-Of-Fit, Multivariate Change Point Estimation and Clustering of Variables Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
  • Teran Hidalgo, Sebastian
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • Independence statistics try to evaluate the statistical dependence between two random vectors of general dimension and type. Independence statistics do not assume a specific form of dependence, but they are sensitive to all forms of departures from independence. The current manuscript seeks to extend the use of independence statistics to three settings. In the first part of the dissertation, we developed a goodness-of-fit test for smoothing spline ANOVA models, which are a nonparametric regression methodology with the useful property that the contribution of the covariates can be decomposed in a ANOVA fashion. The proposed method derives estimated residuals from the model. Then, statistical dependence is evaluated between the estimated residuals and the covariates using independence statistics. If no dependence exists, the model fits the data well. Application of the method is demonstrated with a neonatal mental development data analysis. In the second part, we develop a method for the change point problem where two sets of random vectors are observed sequentially over a dimension, but at some unknown point, the relationship between these two vectors changes. We propose a methodology to estimate the unknown change point without assuming a model. This is accomplished by assessing, with an independence statistic, the strength of the association before and after possible change points. A test for the hypothesis of existence of the change point is developed. We demonstrate its use with blood glucose and physical activity measurements on an individual with type 1 diabetes. In the third part, we develop a method for hierarchical clustering of variables while controlling for type I error rate, which is not done in common clustering methods. We accomplish this by turning the decision of whether to join two clusters into a hypothesis testing problem. The strength of our method is shown by clustering genes from single cell data coming from different tumors.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Wu, Michael
  • Zeng, Donglin
  • Kosorok, Michael
  • North, Kari
  • Chen, Mengjie
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2016

This work has no parents.