Bayesian Model-based Methods for the Analysis of DNA Microarrays with Survival, Genetic, and Sequence Data

Gelfond, Jonathan A. L.

Download PDF

Request Version for Screen Reader

Last Modified

March 20, 2019

Creator

Gelfond, Jonathan A. L.
- Affiliation: Gillings School of Global Public Health, Department of Biostatistics

Abstract

DNA microarrays measure the expression of thousands of genes or DNA fragments simultaneously in which probes have specific complementary hybridization. Gene expression or microarray data analysis problems have a prominent role in the biostatistics, biological sciences, and clinical medicine. The first paper proposes a method for finding associations between the survival time of the subjects and the gene expression of tumor microarrays. Measurement error is known to bias the estimates for survival regression coefficients, and this method minimizes bias. The latent variable model is shown to detect associations between potentially important genes and survival in a breast cancer dataset that conventional models did not detect, and the method is demonstrated to have robustness to misspecification with simulated data. The second paper considers the Expression Quantitative Trait Loci (eQTL) detection problem. An eQTL is a genetic locus that influences gene expression, and the major challenges with this type of data are multiple testing and computational issues. The proposed method extends the Mixture Over Marker (MOM) model to include a structured prior probability that accounts for the transcript location. The new technique exploits the fact that genetic markers are more likely to influence transcripts that share the same location on the genome. The third paper improves the analysis of Chromatin (Ch)-Immunoprecipitation (IP) (ChIP) microarray data. ChIP-chip data analysis estimates the motif of specific Transcription Factor Binding Sites (TFBSs) by comparing the IP DNA sample that is enriched for the TFBS and a control sample of general genomic DNA. The probes on the ChIP-chip array are uniformly spaced on the genome, and the probes that have relatively high intensity in the IP sample will have corresponding sequences that are likely to contain the TFBS motif. Present analytical methods use the array data to discover peaks or regions of IP enrichment then analyze the sequences of these peaks in a separate procedure to discover the motif. The proposed model will integrate enrichment peak finding and motif discovery through a Hidden Markov Model (HMM). Performance comparisons are made between the proposed HMM and the previously developed methods.

Date of publication

May 2007

DOI

https://doi.org/10.17615/gdde-1s66

Resource type

Dissertation

Rights statement

In Copyright

Advisor

Ibrahim, Joseph

Language

English

Access right

Open access

Date uploaded

October 19, 2010

Relations

Parents:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	Bayesian model-based methods for the analysis of DNA microarrays with survival, genetic, and sequence data	2019-04-09	Public	Download

Bayesian Model-based Methods for the Analysis of DNA Microarrays with Survival, Genetic, and Sequence Data

Downloadable Content

Relations

Items