Collections > Electronic Theses and Dissertations > Computational Approaches to Studying Gene Regulation Using Chromatin Accessibility and Gene Expression Assays

The completion of the Human Genome Project marked the beginning of a new era in genomics characterized by significant improvements in high-throughput sequencing technology and the development of new sequencing-based assays to study a wide array of functional elements and biological properties at the genome-wide scale. These advancements were accompanied by the formation of large, multi-institutional consortia that produced publicly available data sets and functional genomic studies that broadened our understanding of the genome. Previously uncharacterized genomic regions became recognized as important components of gene regulation, but the broader knowledgebase of regulatory elements raised new questions to elucidate the growing complexity of gene regulation models. Additionally, quantitative trait loci (QTL) mapping approaches began taking advantage of quantitative sequencing data to study the impacts of genetic variation on molecular phenotypes such as gene expression at the genome-wide level. The popularity of high-throughput methods for studying gene regulation and transcription lead to a data deluge that necessitated new statistical methods and bioinformatics solutions for data management, processing, analysis, visualization, and interpretation. Specialized research areas emerged to better glean insights from sequencing data leading to new challenges and questions. In the following chapters, I present a novel machine learning framework for genomic footprinting, a concept focused on identifying transcription factor (TF) binding sites using chromatin accessibility sequencing data. I demonstrate that my framework outperforms existing methods for classifying TF binding sites via footprinting. In addition, I investigate characteristics of TF binding sites within chromatin accessibility data and assess technical factors that influence footprinting to provide an improved understanding of the strengths and limitations of using these data for TF binding site prediction. Through a separate study, I investigate the impact of a genotoxic chemical 1,3-butadiene on chromatin accessibility and gene expression in a population of genetically diverse mice. I perform expression QTL (eQTL) and chromatin accessibility QTL (cQTL) mapping in these mice and detect eQTLs and cQTLs in each tissue. In all, the work herein demonstrates multiple computational approaches to studying various gene regulatory relationships and provides insight on the efficacy of these approaches to inform future studies.