GRAPHICAL MODELS FOR HIGH DIMENSIONAL DATA WITH GENOMIC APPLICATIONS

Yang, Jenny

Download PDF

Request Version for Screen Reader

Last Modified

March 21, 2019

Creator

Yang, Jenny
- Affiliation: Gillings School of Global Public Health, Department of Biostatistics

Abstract

Many previous studies have demonstrated that gene expression or other types of -omic features collected from patients can help disease diagnosis or treatment selection. For example, a few recent studies demonstrated that gene expression data collected from cancer cell lines are highly informative to predict cancer drug sensitivity (Garnett et al., 2012; Barretina et al., 2012; Chen et al., 2016b). This is partly because many cancer drugs are targeted drugs that perturb a particular mutated gene or protein, and thus having that mutation, or observing the consequence of such mutation in gene expression data, is highly informative for drug sensitivity prediction. Such systematic studies of drug sensitivities require giving different drugs in a series of doses to the same cell line, which is obviously not possible for the human studies. More sophisticated methods are needed to estimate potential effects of cancer drugs based on observational data. Since the effect of a targeted cancer drug can be considered as an intervention to the molecular system of cancer cells, a directed graphical model for gene-gene associations is a natural choice to model the molecular system and to study the consequence of such interventions. In this dissertation, we develop new statistical methods to estimate DAGs using high dimensional -omic data under two scenarios: i) with a model-free approach and ii) single cell RNA-seq data (scRNAseq). In the 1st chapter, we will give a brief introduction to graphical models, the various statistical characterizations of graphical models and the most current approaches to estimate graph structures. Then, we will review the scRNAseq data and current approaches to analyze scRNAseq data. Next, in Chapter 2, we propose a model-free method to estimate graphical models in two steps. The first step uses a model-free variable selection method based on the principles of sufficient dimension reduction. Then, the second step uses a non-parametric conditional independence testing method which utilizes embeddings of the conditional spaces into reproducing kernel Hilbert spaces. We will review some theoretical background in order to establish the asymptotic graphical model estimation consistency of this two-step approach. We examine its performance in simulations and TCGA breast cancer data, where we find significant improvements from current methods that require strong model assumptions. In Chapter 3, we propose a graphical model algorithm to analyze scRNAseq data. Similar to the previous algorithm, we create a two-step estimation method which utilizes a joint penalized zero-inflation model. We assess its performance and drawbacks in simulations. Then, we examined its utility when applied after clustering to a sample of 68k peripheral blood mononuclear cells with multiple subpopulations.

Date of publication

December 2017

Keyword

DOI

https://doi.org/10.17615/mp26-yf10

Resource type

Dissertation

Rights statement

In Copyright

Advisor

Sun, Wei
Li, Quefeng
Engel, Stephanie
Liu, Yufeng
Zeng, Donglin

Degree

Doctor of Philosophy

Degree granting institution

University of North Carolina at Chapel Hill Graduate School

Graduation year

2017

Language

English

Relations

Parents:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	Yang_unc_0153D_17393.pdf	2019-04-11	Public	Download

GRAPHICAL MODELS FOR HIGH DIMENSIONAL DATA WITH GENOMIC APPLICATIONS

Downloadable Content

Relations

Items