Statistical Essays Motivated by Genome-Wide Association Study
Public DepositedAdd to collection
You do not have access to any existing collections. You may create a new collection.
Downloadable Content
Download PDFCitation
MLA
Wang, Ling. Statistical Essays Motivated by Genome-wide Association Study. Chapel Hill, NC: University of North Carolina at Chapel Hill Graduate School, 2015. https://doi.org/10.17615/map0-m470APA
Wang, L. (2015). Statistical Essays Motivated by Genome-Wide Association Study. Chapel Hill, NC: University of North Carolina at Chapel Hill Graduate School. https://doi.org/10.17615/map0-m470Chicago
Wang, Ling. 2015. Statistical Essays Motivated by Genome-Wide Association Study. Chapel Hill, NC: University of North Carolina at Chapel Hill Graduate School. https://doi.org/10.17615/map0-m470- Last Modified
- March 19, 2019
- Creator
-
Wang, Ling
- Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research
- Abstract
- Genome-wide association studies (GWAS) have been gaining popularity in recent years, and have generated a lot of interests in statistics. In this dissertation, motivated by GWAS, we develop statistical methods to identify significant Single-Nucleotide Polymorphisms (SNPs) that are associated with certain phenotype traits of interest. Usually in GWAS, the number of SNPs are much larger than the number of individuals. Hence identifying significant SNPs and estimating their effects is a high-dimensional selection and estimation problem, or sometimes referred to as the large p and small n (p>>n) paradigm. In this talk, we propose three approaches to estimate the proportion of SNPs that are significantly associated with the trait of interest in GWAS, as well as the distribution of their effects. The first one extends the earlier work that models the SNP effects as random effects in a linear mixed model. We instead assume a mixture prior on the random effects, which consists of a pointmass at zero, for those non-significant SNPs, plus a normal component for those significant SNPs. We develop a fast Markov Chain Monte Carlo (MCMC) algorithm to estimate the model parameters. The proposed algorithm reduces the computation time significantly by calculating the posterior conditional on a set of latent variables, that index whether the SNPs are associated with the trait of interest or not. We further relax the prior distribution to a mixture point mass plus a non-parametric distribution. Two types of sieve estimators are proposed based on a least squares (LS) method for probability distributions under the framework of measurement error models. The estimators are obtained by minimizing the distance between the empirical distribution/characteristic functions and the model distribution/characteristic functions, respectively. In the last part, we propose an estimator for the normal mean problem that can adapt to the sparsity of the mean signals as well as incorporate correlation among the signals. The proposed estimator effectively decomposes the arbitrary covariance matrix of the observed signals into two parts: principal factors that derive the strong dependence and weakly dependent error terms. By taking out the largest common factors, the correlation among the signals are significantly weakened. An automatic nonparametric empirical Bayesian method is then used to estimate the sparsity and identify the nonzero means.
- Date of publication
- August 2015
- Subject
- DOI
- Identifier
- Resource type
- Rights statement
- In Copyright
- Advisor
- Ji, Chuanshu
- Carlstein, Edward
- Guo, Guang
- Shen, Haipeng
- Smith, Richard L.
- Degree
- Doctor of Philosophy
- Degree granting institution
- University of North Carolina at Chapel Hill Graduate School
- Graduation year
- 2015
- Language
- Publisher
- Place of publication
- Chapel Hill, NC
- Access right
- There are no restrictions to this item.
- Date uploaded
- August 25, 2015
Relations
- Parents:
This work has no parents.
Items
Thumbnail | Title | Date Uploaded | Visibility | Actions |
---|---|---|---|---|
Wang_unc_0153D_15609.pdf | 2019-04-10 | Public | Download |