Statistical methods in chemoinformatics Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 21, 2019
  • Borysov, Petro
    • Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research
  • In drug discovery prediction of the activity of the compound, or its class label, based on the chemical structure is known as Quantitative Structure-Activity Relationship (QSAR) modeling. In the first chapter of this dissertation we propose a method that is able to identify subregions of the full chemistry space where linear classification will work significantly better than on the full space. The performance of the proposed method is illustrated in simulated examples and on some real data sets. In the second chapter we study variables with low variance, which are often removed from the analysis. We show that some of them contain significant amounts of useful information and can be helpful for prediction. For example, they are seen to help with identification of possibly mislabeled compounds. In the final chapter of the dissertation we study the asymptotic behavior of hierarchical clustering in situations where both sample size and dimension grow to infinity. We derive explicit signal vs noise boundaries between different types of clustering behaviors. We also show that the clustering behavior within the boundaries is the same across a wide spectrum of asymptotic settings.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Hannig, Jan
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill
Graduation year
  • 2013

This work has no parents.