A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms

Creator

Sze, Sing-Hoi
- Other Affiliation: Department of Computer Science and Engineering, Texas A&M University, College Station, TX 77843, USA; Department of Biochemistry & Biophysics, Texas A&M University, College Station, TX 77843, USA
Pimsler, Meaghan L
- Other Affiliation: Department of Entomology, Texas A&M University, College Station, TX 77843, USA
Tomberlin, Jeffery K
- Other Affiliation: Department of Entomology, Texas A&M University, College Station, TX 77843, USA
Jones, Corbin
- Affiliation: College of Arts and Sciences, Department of Biology
Tarone, Aaron M
- Other Affiliation: Department of Entomology, Texas A&M University, College Station, TX 77843, USA

Abstract

Abstract Background With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory-intensive, limiting their applications to small data sets with few libraries. Results We develop a transcriptome assembly algorithm that recovers alternatively spliced isoforms and expression levels while utilizing as many RNA-Seq libraries as possible that contain hundreds of gigabases of data. New techniques are developed so that computations can be performed on a computing cluster with moderate amount of physical memory. Conclusions Our strategy minimizes memory consumption while simultaneously obtaining comparable or improved accuracy over existing algorithms. It provides support for incremental updates of assemblies when new libraries become available.

Date of publication

DOI

Identifier

Resource type

Rights statement

Rights holder

Language

Bibliographic citation

Publisher

Relations

Thumbnail	Title	Date Uploaded	Visibility	Actions
	12864_2017_article_3735.pdf	2019-05-06	Public	Download