Collections > Electronic Theses and Dissertations > Estimation of Graphical Models with Biological Applications

Graphical models are widely used to represent the dependency relationship among random variables. In this dissertation, we have developed three statistical methodologies for estimating graphical models using high dimensional genomic data. In the first two, we estimate undirected Gaussian graphical models (GGMs) which capture the conditional dependence among variables, and in the third, we describe a novel method to estimate a Gaussian Directed Acyclic Graph (DAG). In the first project, we focus on estimating GGMs from a group of dependent data. A motivating example is that of modeling gene expression collected on multiple tissues from the same individual. Existing methods that assume independence among graphs are not applicable in this setting. To estimate multiple dependent graphs, we decompose the problem into two graphical layers: the systemic layer, which is the network affecting all outcomes and therefore describing cross-graph dependency, and the category-specific layer, which represents the graph-specific variation. We propose a new graphical EM technique that estimates the two layers jointly; and also establish the estimation consistency and selection sparsistency of the proposed estimator. We confirm by simulation and real data analysis that our EM method is superior to a naive one-step method Next, we consider estimating GGMs from noisy data. A notable drawback of existing methods for estimating GGMs is that they ignore the existence of measurement error which is common in biological data. We propose a new experimental design using technical replicates, and develop a new methodology using an EM algorithm to efficiently estimate the sparse GGM by taking account the measurement error. We systematically study the asymptotic properties of the proposed method in high dimensional settings. Simulation study suggests that our method have substantially higher sensitivity and specificity to estimate the underlying graph than existing methods. Lastly, we consider the estimation of the skeleton of a Directed Acyclic Graph (DAG) using observational data. We propose a novel method named AdaPC to efficiently estimate the skeleton of a DAG by a two-step approach. The performance of our method is systematically evaluated by numerical examples.