STATISTICAL ANALYSES OF HIGH THROUGHPUT GENETICS AND GENOMICS DATA
Public DepositedAdd to collection
You do not have access to any existing collections. You may create a new collection.
Downloadable Content
Download PDFCitation
MLA
Yin, Zhaoyu. Statistical Analyses Of High Throughput Genetics And Genomics Data. Chapel Hill, NC: University of North Carolina at Chapel Hill Graduate School, 2014. https://doi.org/10.17615/7sv9-m098APA
Yin, Z. (2014). STATISTICAL ANALYSES OF HIGH THROUGHPUT GENETICS AND GENOMICS DATA. Chapel Hill, NC: University of North Carolina at Chapel Hill Graduate School. https://doi.org/10.17615/7sv9-m098Chicago
Yin, Zhaoyu. 2014. Statistical Analyses Of High Throughput Genetics And Genomics Data. Chapel Hill, NC: University of North Carolina at Chapel Hill Graduate School. https://doi.org/10.17615/7sv9-m098- Last Modified
- March 19, 2019
- Creator
-
Yin, Zhaoyu
- Affiliation: Gillings School of Global Public Health, Department of Biostatistics
- Abstract
- Mixed effects models are commonly used for modeling the dependence structure between twin pairs in twin studies. However, mixed effects models are extremely computationally intensive for eQTL (expression quantitative trait loci) analysis. To overcome the computational challenge, twin pairs can be randomly split into two independent groups on which multiple linear regression analysis can be performed. In my first topic, a computationally efficient score statistic is proposed to combine non-independent analysis results from the two groups. Genome-wide association studies (GWAS) aim to identify genetic variants associated with complex traits. The standard first pass GWAS analysis where SNPs are tested one at a time may fail to detect associations due to, for example, multiple causal SNPs. Alternatively, regional SNP-set analyses have been established to test the association between a set of SNPs and a phenotype through a mixed effects model where testing the association is equivalent to testing whether one or more of the variance components are equal to 0. However, the null distribution of the likelihood ratio test (LRT) does not follow the conventional 50:50 mixture chi-square distribution in this setting. My second topic investigates the spectral representation of LRT, based on which an empirical resampling procedure is proposed to approximate the null distribution of LRT. When both GWAS and gene expression data are available on the same set of samples, it is natural to add gene expression as a covariate into the SNP-set analysis to jointly model the SNP and transcript association with the trait. One biologically interesting question is whether the complex phenotype is associated with the gene expression conditional on the SNP effects. My last research topic jointly models the association between the gene expression and SNP-set with the trait. Unlike traditional mixed effects models, our model allows the gene expression to be dependent on the random SNP effects since the independent assumption is likely to be violated when the gene expression is also associated with the SNP set. With relaxed independence assumption, we can make valid statistical inference and parameter estimation.
- Date of publication
- December 2014
- Subject
- DOI
- Identifier
- Resource type
- Rights statement
- In Copyright
- Advisor
- Sun, Wei
- Zhou, Haibo
- Zou, Fei
- Preisser, John
- Sullivan, Patrick
- Degree
- Doctor of Philosophy
- Degree granting institution
- University of North Carolina at Chapel Hill Graduate School
- Graduation year
- 2014
- Language
- Publisher
- Place of publication
- Chapel Hill, NC
- Access right
- There are no restrictions to this item.
- Date uploaded
- April 23, 2015
Relations
- Parents:
This work has no parents.
Items
Thumbnail | Title | Date Uploaded | Visibility | Actions |
---|---|---|---|---|
Yin_unc_0153D_15000.pdf | 2019-04-11 | Public | Download |