Bayesian Viral Substitution Analysis and Covariance Estimation via Generalized Fiducial Inference

Shi, Wen

Download PDF

Request Version for Screen Reader

Last Modified

March 19, 2019

Creator

Shi, Wen
- Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research

Abstract

With the advances in biology and computing technologies, there have been increasing amount of big bio data awaiting to be analyzed. Aiming to develop statistical tools for omics data, we focus on the problem of viral sequencing data modeling as well a fundamental statistics question with applications in both biology and many other fields. This dissertation is comprised of three major parts. Motivated by a multi-time sampled, case-control influenza viral population study, in the first part we model the sequencing data of a viral population under a Bayesian Dirichlet mixture distribution. We have developed an efficient clustering scheme that enables us to distinguish treatment causal changes from variation within viral populations. As a proof of concept, we applied our method to a well-studied HIV dataset, and successfully identified known drug resistant regions and additional potential sites. For the influenza data, our algorithm revealed two genome sites with strong evidence of treatment effect. The second part of the thesis concerns the covariance matrix estimation in a high-dimensional multivariate linear models and sparse covariate settings using fiducial inference. The sparsity imposed on the covariate matrix allows to estimate relationships between a list of gene expressions and several metabolic levels under a high dimension low sample size setting. Aiming to quantify the uncertainty of the estimators without having to choose a prior, we have developed a fiducial approach to the estimation of covariance matrix. Built upon the Fiducial Berstein-von Mises Theorem, we show that the fiducial distribution of the covariance matrix is consistent under our framework. Furthermore, we propose an adaptive efficient reversible jump Markov chain Monte Carlo algorithm for sampling from the fiducial distribution, which enables us to define a meaningful confidence region for the covariance matrix. In the last part of the thesis, we examine the stochastic models for capturing the evolutionary processes of gene expression levels. Generalizing a microarray Brownian motion (BM) model, we have developed a BM model for high-throughput sequencing data that takes sampling variance into account. To allow conservation in the evolution process, we also investigate Ornstein-Uhlenbeck (OU) models. Applying to a multiple-tissue mammalian dataset, we showed that the OU model is more appropriate for the top 10 highly expressed genes in the dataset, and we performed hypothesis testing for significant changes in gene expression levels along specific lineages.

Date of publication

August 2015

Keyword

Subject

DOI

https://doi.org/10.17615/12sc-ba37

Identifier

Shi_unc_0153D_15514.pdf

Resource type

Dissertation

Rights statement

In Copyright

Advisor

Bhamidi, Shankar
Jones, Corbin
Lu, Shu
Hannig, Jan
Zhang, Kai

Degree

Doctor of Philosophy

Degree granting institution

University of North Carolina at Chapel Hill Graduate School

Graduation year

2015

Language

English

Publisher

University of North Carolina at Chapel Hill Graduate School

Place of publication

Chapel Hill, NC

Access right

There are no restrictions to this item.

Date uploaded

August 25, 2015

Relations

Parents:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	Shi_unc_0153D_15514.pdf	2019-04-11	Public	Download

Bayesian Viral Substitution Analysis and Covariance Estimation via Generalized Fiducial Inference

Downloadable Content

Relations

Items