Analysis of Admixed Animals using Indirect Haplotype Information from Existing Technologies Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 19, 2019
  • Fu, Chen-Ping
    • Affiliation: College of Arts and Sciences, Department of Computer Science
  • The use of genotyping and sequencing technologies in genetic studies typically involves inspecting variants defined within a single reference genome. While this definition of genetic variation promotes a simple model of the genome that is easy to organize and analyze, it does not encompass the full breadth of variation possible between individuals. Fortunately, existing technologies capture information about genomic variation outside the original targeted variants. By incorporating these low-level signals, which classical methods generally regard as noise, we can make more accurate inferences about the relationship between admixed animals and their ancestral and parental strains. In this thesis, I use both genotyping microarrays and RNA sequencing data to demonstrate the utility of using signals from ancestral haplotype data to analyze admixed animals. I introduce a novel method for designing a genotyping microarray that provides maximal information about ancestral haplotypes for the admixed population Collaborative Cross (CC). The result is the 78K-marker MegaMUGA array, which achieves high call rates and distinction power in a diverse set of mouse strains. Using probe intensities from microarrays such as the MegaMUGA, I develop methods for founder haplotype inference as well as quantitative trait loci (QTL) mapping. I show that these intensity-based methods outperform traditional genotype call-based methods due to their ability to capture additional information about the local sequence, which I confirm using high-throughput sequencing data within probe regions. In addition to demonstrating my thesis with microarray intensity data, I also use RNA-seq read data from parental strains to estimate allele-specific expression (ASE) in the F1 offspring. By directly using parental read data as features in a regularized regression problem, I can achieve accurate estimations of the offspring's expressed gene transcripts and allele-specific expression levels, showing that no matter the data source, incorporating low-level signals directly from ancestral strains provides a more accurate template for analysis of admixed strains.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Zou, Fei
  • Pardo-Manuel Pardo-Pardo-Manuel de Villena, Fernando
  • Prins, Jan
  • Sun, Wei
  • McMillan, Leonard
  • Jojic, Vladimir
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2015
Place of publication
  • Chapel Hill, NC
  • There are no restrictions to this item.

This work has no parents.