Statistical methods in chemoinformatics Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 21, 2019
Creator
  • Borysov, Petro
    • Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research
Abstract
  • In drug discovery prediction of the activity of the compound, or its class label, based on the chemical structure is known as Quantitative Structure-Activity Relationship (QSAR) modeling. In the first chapter of this dissertation we propose a method that is able to identify subregions of the full chemistry space where linear classification will work significantly better than on the full space. The performance of the proposed method is illustrated in simulated examples and on some real data sets. In the second chapter we study variables with low variance, which are often removed from the analysis. We show that some of them contain significant amounts of useful information and can be helpful for prediction. For example, they are seen to help with identification of possibly mislabeled compounds. In the final chapter of the dissertation we study the asymptotic behavior of hierarchical clustering in situations where both sample size and dimension grow to infinity. We derive explicit signal vs noise boundaries between different types of clustering behaviors. We also show that the clustering behavior within the boundaries is the same across a wide spectrum of asymptotic settings.
Date of publication
DOI
Resource type
Rights statement
  • In Copyright
Advisor
  • Hannig, Jan
Degree
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill
Graduation year
  • 2013
Language
Publisher
Parents:

This work has no parents.

Items