Weighted graph matching approaches to structure comparison and alignment and their application to biological problems Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 21, 2019
  • Choi, Kwangbom
    • Affiliation: College of Arts and Sciences, Department of Computer Science
  • In pattern recognition and machine learning, comparing and contrasting are the most fundamental operations: from similarities we derive common rules encoded in the systems, while from difference we infer what makes each system unique. The biological sciences are not an exception to these operations and, in fact, rely heavily on their use. More recently, the emergence of high-throughput measurement technologies has highlighted the need for novel approaches capable of enhancing our ability to understand complex relationships in these data sets. Often, these relationships can be best represented using graphs (or networks), where nodes are biochemical components such as genes, RNAs, proteins or metabolites, and edges indicate the types (and often quality) of relationship. Comparison of relationships is generally performed by aligning the networks of interest. For example, for protein-protein interaction (PPI) networks, the goal of network alignment is to find mappings between nodes (proteins) which are highly useful in identifying signaling pathways or protein complexes and to annotate genes of unknown functionality from subnetworks conserved across multiple species. Phylogenetic trees are also graph structures that describe evolutionary relationship among groups of organisms and their hypothetical ancestors. As it has been shown in a large volume of previous work, comparison of trees also opens the possibility of supporting or building new evolutionary hypotheses: for example, the detection of host-parasite symbiosis, gene coevolution as a signal of physical interactions among genes, or nonstandard events such as horizontal gene transfer. The goal of this thesis is to develop and implement a flexible set of algorithms and methodologies that can be used for the alignment of trees and/or networks having various sizes and properties. We first define a new relaxed model of graph isomorphism in which the shortest path lengths are preserved between corresponding intra-node pairs. Then, based on Google's PageRank model, we present a new tree matching approach, phyloAligner, which resolves several weakness of previous approaches. We further generalize this tree matching algorithm to a broader flexible framework, MCS-Finder, as a scalable and error-tolerant approximation for identifying the maximum common substructure between weighted graphs or distance matrices of different sizes. For phylogenetic trees with weighted edges and strictly-labeled nodes, multidimensional scaling-based methods, xCEED, can effectively evaluate the structural similarity and identify which regions are congruent/incongruent. These methods successfully detected coevolutionary signals as well as nonstandard evolutionary events such as horizontal gene transfer, and recovered interaction specificity between multigene families.
Date of publication
Resource type
Rights statement
  • In Copyright
  • "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science."
  • Gomez, Shawn
Degree granting institution
  • University of North Carolina at Chapel Hill
Place of publication
  • Chapel Hill, NC
  • Open access

This work has no parents.