Weighted graph matching approaches to structure comparison and alignment and their application to biological problems

Choi, Kwangbom

Download PDF

Request Version for Screen Reader

Last Modified

March 21, 2019

Creator

Choi, Kwangbom
- Affiliation: College of Arts and Sciences, Department of Computer Science

Abstract

In pattern recognition and machine learning, comparing and contrasting are the most fundamental operations: from similarities we derive common rules encoded in the systems, while from difference we infer what makes each system unique. The biological sciences are not an exception to these operations and, in fact, rely heavily on their use. More recently, the emergence of high-throughput measurement technologies has highlighted the need for novel approaches capable of enhancing our ability to understand complex relationships in these data sets. Often, these relationships can be best represented using graphs (or networks), where nodes are biochemical components such as genes, RNAs, proteins or metabolites, and edges indicate the types (and often quality) of relationship. Comparison of relationships is generally performed by aligning the networks of interest. For example, for protein-protein interaction (PPI) networks, the goal of network alignment is to find mappings between nodes (proteins) which are highly useful in identifying signaling pathways or protein complexes and to annotate genes of unknown functionality from subnetworks conserved across multiple species. Phylogenetic trees are also graph structures that describe evolutionary relationship among groups of organisms and their hypothetical ancestors. As it has been shown in a large volume of previous work, comparison of trees also opens the possibility of supporting or building new evolutionary hypotheses: for example, the detection of host-parasite symbiosis, gene coevolution as a signal of physical interactions among genes, or nonstandard events such as horizontal gene transfer. The goal of this thesis is to develop and implement a flexible set of algorithms and methodologies that can be used for the alignment of trees and/or networks having various sizes and properties. We first define a new relaxed model of graph isomorphism in which the shortest path lengths are preserved between corresponding intra-node pairs. Then, based on Google's PageRank model, we present a new tree matching approach, phyloAligner, which resolves several weakness of previous approaches. We further generalize this tree matching algorithm to a broader flexible framework, MCS-Finder, as a scalable and error-tolerant approximation for identifying the maximum common substructure between weighted graphs or distance matrices of different sizes. For phylogenetic trees with weighted edges and strictly-labeled nodes, multidimensional scaling-based methods, xCEED, can effectively evaluate the structural similarity and identify which regions are congruent/incongruent. These methods successfully detected coevolutionary signals as well as nonstandard evolutionary events such as horizontal gene transfer, and recovered interaction specificity between multigene families.

Date of publication

December 2011

DOI

https://doi.org/10.17615/4s0v-5929

Resource type

Dissertation

Rights statement

In Copyright

Note

"... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science."

Advisor

Gomez, Shawn

Degree granting institution

University of North Carolina at Chapel Hill

Language

English

Publisher

University of North Carolina at Chapel Hill

Place of publication

Chapel Hill, NC

Access right

Open access

Date uploaded

March 18, 2013

Relations

Parents:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
		2019-04-11	Public	Download

Weighted graph matching approaches to structure comparison and alignment and their application to biological problems

Downloadable Content

Relations

Items