Collections > Electronic Theses and Dissertations > Identifying genetic mechanisms of cardiometabolic traits and diseases using quantitative sequence data
pdf

Cardiometabolic diseases are a worldwide health concern. Genetics studies have identified hundreds of genetic loci associated with these diseases and other cardiometabolic risk factors, but gaps remain in the understanding of the biological mechanisms responsible for these associations. Sequence data from quantitative experiments, such as DNase-seq and ChIP-seq, that identify genomic regions regulating gene transcription are helping to fill these gaps. Allelic imbalance at heterozygous sites, or enrichment of one allele, in this data can indicate allelic differences in transcriptional regulation, but reference mapping biases present in sequence alignments prevent accurate allelic imbalance detection. We describe a pipeline, AA-ALIGNER, that removes mapping biases at heterozygous sites and increases allelic imbalance detection accuracy in samples with any amount of genotype data available. When complete genotype information is not available, AA-ALIGNER more accurately detects allelic imbalance at imputed heterozygous sites than heterozygous sites predicted using the sequence data. At predicted heterozygous sites, imbalance detection is more accurate at common variants than other variants. Additionally, imbalance detection with AA-ALIGNER is robust to a variety of experimental and analytical parameters. Using AA-ALIGNER, we detected evidence of allelic imbalance at 22,414 heterozygous sites in data from samples with relevance to cardiometabolic disease and risk factors. We have identified protein binding motifs for one of the imbalanced proteins at a majority of these sites, and evidence that imbalance in data for this protein is associated with imbalance in data for other proteins. Additionally, a subset of sites of allelic imbalance are located at expression quantitative trait loci and/or genome-wide association loci for cardiometabolic traits and diseases. These sites are strong candidates to be studied experimentally and we report experimental evidence of allelic differences in protein binding, enhancer activity and/or the regulation of specific genes for a handful of these sites. Using allelic imbalance detection, we have detected differences in protein binding across the genome providing valuable insight into mechanisms of transcriptional regulation. Focusing on cardiometabolic diseases and risk factors, this work demonstrates the utility of allelic imbalance detection in studying genetic effects on the regulation of gene transcription at complex disease- and trait-associated loci.