Analysis of Admixed Animals using Indirect Haplotype Information from Existing Technologies

Fu, Chen-Ping

Download PDF

Request Version for Screen Reader

Last Modified

March 19, 2019

Creator

Fu, Chen-Ping
- Affiliation: College of Arts and Sciences, Department of Computer Science

Abstract

The use of genotyping and sequencing technologies in genetic studies typically involves inspecting variants defined within a single reference genome. While this definition of genetic variation promotes a simple model of the genome that is easy to organize and analyze, it does not encompass the full breadth of variation possible between individuals. Fortunately, existing technologies capture information about genomic variation outside the original targeted variants. By incorporating these low-level signals, which classical methods generally regard as noise, we can make more accurate inferences about the relationship between admixed animals and their ancestral and parental strains. In this thesis, I use both genotyping microarrays and RNA sequencing data to demonstrate the utility of using signals from ancestral haplotype data to analyze admixed animals. I introduce a novel method for designing a genotyping microarray that provides maximal information about ancestral haplotypes for the admixed population Collaborative Cross (CC). The result is the 78K-marker MegaMUGA array, which achieves high call rates and distinction power in a diverse set of mouse strains. Using probe intensities from microarrays such as the MegaMUGA, I develop methods for founder haplotype inference as well as quantitative trait loci (QTL) mapping. I show that these intensity-based methods outperform traditional genotype call-based methods due to their ability to capture additional information about the local sequence, which I confirm using high-throughput sequencing data within probe regions. In addition to demonstrating my thesis with microarray intensity data, I also use RNA-seq read data from parental strains to estimate allele-specific expression (ASE) in the F1 offspring. By directly using parental read data as features in a regularized regression problem, I can achieve accurate estimations of the offspring's expressed gene transcripts and allele-specific expression levels, showing that no matter the data source, incorporating low-level signals directly from ancestral strains provides a more accurate template for analysis of admixed strains.

Date of publication

December 2015

Keyword

Subject

DOI

https://doi.org/10.17615/73y4-cj46

Identifier

Fu_unc_0153D_15515.pdf

Resource type

Dissertation

Rights statement

In Copyright

Advisor

Pardo-Manuel Pardo-Pardo-Manuel de Villena, Fernando
McMillan, Leonard
Prins, Jan
Zou, Fei
Jojic, Vladimir
Sun, Wei

Degree

Doctor of Philosophy

Degree granting institution

University of North Carolina at Chapel Hill Graduate School

Graduation year

2015

Language

English

Publisher

University of North Carolina at Chapel Hill Graduate School

Place of publication

Chapel Hill, NC

Access right

There are no restrictions to this item.

Date uploaded

January 21, 2016

Relations

Parents:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	Fu_unc_0153D_15515.pdf	2019-04-07	Public	Download

Analysis of Admixed Animals using Indirect Haplotype Information from Existing Technologies

Downloadable Content

Relations

Items