Bayesian Model Based Approaches In The Analysis Of Chromatin Structure And Motif Discovery Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
  • Mitra, Ritendranath
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • Efficient detection of transcription factor (TF) binding sites is an important and unsolved problem in computational genomics. Recently, due to the poor predictive ability of motif finding algorithms, along with the recent proliferation of high-throughput genomic technologies, there has been a drive to utilize secondary information, such as the positioning of nucleosomes, for improving predictions. Nucleosomes prevent transcription factor binding at those sites by blocking the TF access to the DNA. We aimed to construct an accurate map of nucleosome-free regions (NFRs), based on data from high-throughput genomic tiling arrays in yeast. Direct use of Hidden Markov Models are not always applicable due to variable-sized gaps and missing data. So we have extended the hidden Markov model procedure to a continuous time version while efficiently incorporating DNA sequence features that are relevant to nucleosome formation. Simulation studies and an application to a yeast nucleosomal assay demonstrate the advantages of the new method. The established biological role of nucleosomes in relation to TF binding, led us to formulate a joint model in the fourth chapter. The algorithm was implemented on the FAIRE data set, and comparisons were made with existing motif search algorithms. The fifth chapter deals with HMM asymptotics. We obtained results on consistency asymptotic normality and contiguity of a hidden Markov model. These have helped our inference on the convergence properties of the posterior and the consistency of the Bayesian posterior estimates. This has led to the conclusion that the Bayesian inference of a HMM run on sufficiently large datasets (which is typical, in the case of genomic data) leads us very close to the underlying true parameters, as in the case of iid models. The result is fairly general in nature to provide the justification for HMM inference in a wide variety of datasets.
Date of publication
Resource type
Rights statement
  • In Copyright
  • "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biostatistics."
  • Sen, Pranab Kumar
Place of publication
  • Chapel Hill, NC
  • Open access

This work has no parents.