Correcting Reference Bias in High-throughput Sequencing Analysis

Last Modified

Creator

Huang, Shunping
- Affiliation: College of Arts and Sciences, Department of Computer Science

Abstract

Mapping reads to a reference sequence is a common step when analyzing high throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending the genetic distances of the target sequences from the reference. To avoid this bias researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings, and the selection of which variants to include to remove biases. To address these issues, I proposed novel and generic pipelines that integrate the genomic variations from known or suspected founders into reference sequences and then perform read alignment. Experiments show that my pipelines can align more reads with much lower reference bias than the traditional pipeline where reads are mapped against the standard reference sequence. They can be applied to a wide range of organisms, including inbreds, F1s, and outbreds, and various high throughput sequencing approaches, such as RNAseq, DNAseq, ChiPseq, etc.

Date of publication

Keyword

Subject

DOI

Identifier

Resource type

Rights statement

Advisor

Degree

Degree granting institution

Graduation year

Language

Publisher

Place of publication

Access right

Date uploaded

Relations

Thumbnail	Title	Date Uploaded	Visibility	Actions
	Huang_unc_0153D_15318.pdf	2019-04-09	Public	Download