Downloadable Content

Download PDF
Last Modified
  • March 21, 2019
  • Yang, Jenny
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • Many previous studies have demonstrated that gene expression or other types of -omic features collected from patients can help disease diagnosis or treatment selection. For example, a few recent studies demonstrated that gene expression data collected from cancer cell lines are highly informative to predict cancer drug sensitivity (Garnett et al., 2012; Barretina et al., 2012; Chen et al., 2016b). This is partly because many cancer drugs are targeted drugs that perturb a particular mutated gene or protein, and thus having that mutation, or observing the consequence of such mutation in gene expression data, is highly informative for drug sensitivity prediction. Such systematic studies of drug sensitivities require giving different drugs in a series of doses to the same cell line, which is obviously not possible for the human studies. More sophisticated methods are needed to estimate potential effects of cancer drugs based on observational data. Since the effect of a targeted cancer drug can be considered as an intervention to the molecular system of cancer cells, a directed graphical model for gene-gene associations is a natural choice to model the molecular system and to study the consequence of such interventions. In this dissertation, we develop new statistical methods to estimate DAGs using high dimensional -omic data under two scenarios: i) with a model-free approach and ii) single cell RNA-seq data (scRNAseq). In the 1st chapter, we will give a brief introduction to graphical models, the various statistical characterizations of graphical models and the most current approaches to estimate graph structures. Then, we will review the scRNAseq data and current approaches to analyze scRNAseq data. Next, in Chapter 2, we propose a model-free method to estimate graphical models in two steps. The first step uses a model-free variable selection method based on the principles of sufficient dimension reduction. Then, the second step uses a non-parametric conditional independence testing method which utilizes embeddings of the conditional spaces into reproducing kernel Hilbert spaces. We will review some theoretical background in order to establish the asymptotic graphical model estimation consistency of this two-step approach. We examine its performance in simulations and TCGA breast cancer data, where we find significant improvements from current methods that require strong model assumptions. In Chapter 3, we propose a graphical model algorithm to analyze scRNAseq data. Similar to the previous algorithm, we create a two-step estimation method which utilizes a joint penalized zero-inflation model. We assess its performance and drawbacks in simulations. Then, we examined its utility when applied after clustering to a sample of 68k peripheral blood mononuclear cells with multiple subpopulations.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Sun, Wei
  • Li, Quefeng
  • Engel, Stephanie
  • Liu, Yufeng
  • Zeng, Donglin
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2017

This work has no parents.