Collections > UNC Chapel Hill Undergraduate Honors Theses Collection > A Method for Construction of a Splice Graph from RNA Sequence Data
pdf

Although genetic information in the cell is stored in DNA, the RNA transcripts obtained from DNA indicate which genes are actively transcribed. Modern high-throughput sequencing techniques allow accurate sequencing of short RNA fragments, which may then be aligned to a reference genome. These alignments can be summarized by constructing a "splice graph", in which nodes represent genomic coordinates and edges represent sequences that are retained or spliced out of observed transcripts. Each full-length transcript corresponds to a path through the graph. I have written software to build a splice graph from a collection of short reads aligned to a reference genome. This software incorporates variants observed relative to the reference genome as additional paths. I have also written tools to manipulate and traverse such graphs. An application of this graph is correction of noisy full-length RNA transcripts. Such a transcript may be aligned to paths through the graph in order to identify its original sequence.