Estimation and hypothesis testing with additive kernel machines for high-dimensional data

Clark, Jennifer

Download PDF

Request Version for Screen Reader

Last Modified

March 22, 2019

Creator

Clark, Jennifer
- Affiliation: Gillings School of Global Public Health, Department of Biostatistics

Abstract

Advances in high throughput biotechnology have culminated in the development of large scale, population based studies for identifying genomic features (e.g. genes, SNPs, CpGs, etc.) associated with complex diseases and traits. Understanding an individual's genetic disposition for particular traits and diseases can provide information toward the development of individualized risk profiles and treatment regimes and simultaneously provides clues as to the biological mechanisms underlying complex traits. However, the high-dimensionality of the feature space, the limited availability of samples, and our incomplete understanding of how features biologically influence various diseases impose a grand challenge for statisticians. To mitigate some of these challenges, we propose several new methods. First, we develop the additive least square kernel machine (ALSKM) approach for nonparametrically modeling and testing the cumulative effect of a group of features (such as multiple biologically related CpGs) while nonparametrically adjusting for complex, nonlinear covariates. Our proposed methods model both the genomic features and the complex covariates using the kernel machine framework. Second, building on the ALSKM, we develop a novel approach for testing for interactions between two different groups of (biologically related) features. Specifically, we develop a multi-marker test which can test for epistasis, or gene-gene interactions, between two different groups of genomic features. Finally, we again use on the machinery developed under Topics 1 and 2 to develop an approach for testing the association between rare variants and a phenotype in the presence of common variants while accommodating potential interactions between the common and rare variants. By focusing on multi-feature testing, these approaches reduce the dimensionality of the data. Using the kernel machine framework allows for flexible, possibly nonparametric, analysis which is important given our incomplete understanding of how features influence various traits and diseases.

Date of publication