Enhancing eQTL analysis techniques with special attention to the transcript dependency structure Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
Creator
  • Schwarz, John Carter
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
Abstract
  • Gene expression microarray analysis and genetic marker association studies are two common experimental methods in the genetic literature. A growing number of studies have begun combining these two experiments into a single study known as an expressed quantitative trait loci (eQTL) study. Analysis of eQTL data has been performed on several different organisms including yeast, maize, mouse, and human. We propose a set of methods to effectively analyze eQTL data by properly transforming and adjusting analysis models. Our method addresses multiple issues often left out of eQTL analysis that include population stratification and adjustment of racial and ethnic classifications, adjustment of multiple covariates, and the influence of extreme outlying observations. Additionally we propose a statistic that is able to provide significance for trans bands (i.e., genetic markers that harbor a large number of eQTL) without the computational intensity of permutation testing. Most methods that identify a significance threshold for trans band activity either use simple binning approaches or have complex statistical methods that may require many assumptions and restrictions. We use a parametric approach that uses known distributions and simple approximations to develop a significance threshold. The advantages of our methods are that they account for correlation structures in the gene expression data and correlation between genetic markers. Also by using a parametric approach we do not rely on permutation testing which can be computationally daunting for even modestly sized studies. In the second part we will focus in on multiple testing in genetic applications. We study the family-wise error control by quantifying the probability that our test statistic crosses a defined threshold. The existing methods that employ this technique leave room for adjustments and modifications that allow for use in a variety of situations. We also explore the idea of considering discoveries as clumps of genetic markers instead of individual markers. By considering a clump as a single discovery, we can redefine the false discovery rate in terms of clumps and not single hypotheses. Additionally we provide some modifications to better model complex correlation structures as well as handle situations in which limited information on the markers is available.
Date of publication
DOI
Resource type
Rights statement
  • In Copyright
Note
  • "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biostatistics."
Advisor
  • Wright, Fred A.
Language
Publisher
Place of publication
  • Chapel Hill, NC
Access
  • Open access
Parents:

This work has no parents.

Items