Fast Bayesian methods for genetic mapping applicable for high-throughput datasets Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 21, 2019
  • Chang, Yu-Ling
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • QTL mapping is a statistical method for detecting possible gene locations (called Quantitative Trait Loci or QTL) and those genes' effects on the variation in a quantitative phenotype, such as the height of a corn plant, etc. QTL mapping has become an important issue in genetic analysis and has made important contributions to the fields of medicine and agriculture. Traditional QTL mapping methods scan the whole genome and calculate the profile likelihood ratios test statistic at each putative QTL location. The maxima of the test statistics for all putative QTL locations are compared with the genome-wide threshold to identify the QTL. In this thesis, we propose several fast Bayesian methods for QTL mapping, which not only provide direct approximate QTL posterior probabilities at all putative gene locations, but also offer highly interpretable posterior densities for linkage, without the need for Bayes factors in model selection. The applications to simulated data and real data show these methods are highly efficient and more rapid than the alternatives, grid search integration, importance sampling, Markov Chain Monte Carlo (MCMC) sampling or adaptive quadrature. Our results also provide insight into the connection between the profile likelihood ratios test statistic and the posterior probability for linkage. The results of these methods are easy to interpret and have the advantage of producing posterior densities for all model parameters. We infer the presence of QTL at locations with largest posterior probabilities. Because of the high speed and high accuracy of these methods, they are highly suitable for studying high-throughput data sets, e.g. eQTL data sets. The eQTL analysis is a very important application of QTL mapping to a microarray data set, where thousands of transcripts are treated as the phenotypes and provides us insight into the natural variation in gene expression levels. The approach offers highly interpretable direct linkage posterior densities for each transcript, and opens new avenues for research in this area. Biologically attractive priors involving explicit hyperparameters for probabilities of cis-acting and trans-acting QTL are easily incorporated. We also extend the one QTL Bayesian method to multiple QTL. The advantage of this method is the simultaneous detection of multiple QTL and appropriate modelling of their joint effects. Multiple QTL mapping can be computationally intensive, even for our efficient Bayesian approaches. Thus, a fully Bayesian multiple QTL approach for high-throughput datasets remains challenging. We investigate a heuristic for conditional search on the two-location search space that shows promise for identifying the global maximum, and offers the potential for extended approximate Bayesian approaches.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Wright, Fred A.
Degree granting institution
  • University of North Carolina at Chapel Hill
  • Open access

This work has no parents.