Kernel machine methods for analysis of genomic data from different sources

Zhao, Ni

Download PDF

Request Version for Screen Reader

Last Modified

March 22, 2019

Creator

Zhao, Ni
- Affiliation: Gillings School of Global Public Health, Department of Biostatistics

Abstract

Comprehensive understanding of complex trait etiology requires examination of multiple sources of genomic variability. Recent advances in high-throughput biotechnology, especially sequencing technology, have enabled multiple platform genomic profile of biological samples. In this dissertation, we consider using the kernel machine regression (KMR) framework to analyze data from different genetic data sources. In the first part of this dissertation, we develop a new strategy for identification of large scale, global changes in methylation that are associated with environmental variables or clinical outcomes via a functional regression approach. The density or the cumulative distribution function of the methylation values for each individual can be approximated using B-spline basis functions with the spline coefficients to summarize the individual's overall methylation profile. A variance component score test is proposed to test for association between the overall distribution and a continuous or dichotomous outcome and applied to two real studies. In the second part, we construct a microbiome regression-based kernel association test (MiRKAT) for testing the association between microbial community profiles and a continuous or dichotomous variable of interest such as an environmental exposure or disease status. This method regresses the outcome on the covariates (including potential confounders) and the microbiome compositional profiles through kernel functions. We demonstrate the improved control of type I error and superior power of MiRKAT compared to existing methods through simulations and real studies. In the final part, we focus on integrative analysis of genome wide association studies (GWAS) and methylation studies. We propose to use the KMR for first testing the cumulative genetic/epigenetic effect on a trait and for subsequent mediation analysis to understand the mechanisms by which the genomic data influence the trait. In particular, we develop an approach that works at the gene level (to allow for a common analysis unit across data types). We compare pair-wise similarity in trait values between individuals to pair-wise similarity in methylation and genotype values, with correspondence suggestive of association. For a significant gene, we develop a causal steps approach to mediation analysis which enables elucidation of the manner in which the different data types work, or do not work, together.

Date of publication