Applications of Independence Statistics to Goodness-Of-Fit, Multivariate Change Point Estimation and Clustering of Variables

Teran Hidalgo, Sebastian

Applications of Independence Statistics to Goodness-Of-Fit, Multivariate Change Point Estimation and Clustering of Variables

Public Deposited

Analytics

Download PDF

Request Version for Screen Reader

Last Modified

March 20, 2019

Creator

Teran Hidalgo, Sebastian
- Affiliation: Gillings School of Global Public Health, Department of Biostatistics

Abstract

Independence statistics try to evaluate the statistical dependence between two random vectors of general dimension and type. Independence statistics do not assume a specific form of dependence, but they are sensitive to all forms of departures from independence. The current manuscript seeks to extend the use of independence statistics to three settings. In the first part of the dissertation, we developed a goodness-of-fit test for smoothing spline ANOVA models, which are a nonparametric regression methodology with the useful property that the contribution of the covariates can be decomposed in a ANOVA fashion. The proposed method derives estimated residuals from the model. Then, statistical dependence is evaluated between the estimated residuals and the covariates using independence statistics. If no dependence exists, the model fits the data well. Application of the method is demonstrated with a neonatal mental development data analysis. In the second part, we develop a method for the change point problem where two sets of random vectors are observed sequentially over a dimension, but at some unknown point, the relationship between these two vectors changes. We propose a methodology to estimate the unknown change point without assuming a model. This is accomplished by assessing, with an independence statistic, the strength of the association before and after possible change points. A test for the hypothesis of existence of the change point is developed. We demonstrate its use with blood glucose and physical activity measurements on an individual with type 1 diabetes. In the third part, we develop a method for hierarchical clustering of variables while controlling for type I error rate, which is not done in common clustering methods. We accomplish this by turning the decision of whether to join two clusters into a hypothesis testing problem. The strength of our method is shown by clustering genes from single cell data coming from different tumors.

Date of publication

August 2016

Keyword

DOI

https://doi.org/10.17615/p6n0-4096

Resource type

Dissertation

Rights statement

In Copyright

Advisor

Wu, Michael
Zeng, Donglin
Kosorok, Michael
North, Kari
Chen, Mengjie

Degree

Doctor of Philosophy

Degree granting institution

University of North Carolina at Chapel Hill Graduate School

Graduation year

2016

Language

English

Relations

Parents:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	TeranHidalgo_unc_0153D_16341.pdf	2019-04-11	Public	Download

Applications of Independence Statistics to Goodness-Of-Fit, Multivariate Change Point Estimation and Clustering of Variables

Downloadable Content

Relations

Items