FMLRC: Hybrid long read error correction using an FM-index Public Deposited

Downloadable Content

Download PDF
  • Wang, Jeremy R
    • Affiliation: School of Medicine, Department of Genetics
  • McMillan, Leonard
    • Affiliation: College of Arts and Sciences, Department of Computer Science
  • Holt, James
    • Affiliation: College of Arts and Sciences, Department of Computer Science
  • Jones, Corbin
    • Affiliation: School of Medicine, Integrative Program for Biological and Genome Sciences, College of Arts and Sciences, Department of Computer Science
  • Abstract Background Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy. Results We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Conclusion Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.
Date of publication
  • doi:10.1186/s12859-018-2051-3
Resource type
  • Article
Rights statement
  • In Copyright
Rights holder
  • The Author(s)
  • English
Bibliographic citation
  • BMC Bioinformatics. 2018 Feb 09;19(1):50
  • BioMed Central

This work has no parents.