Collections > Electronic Theses and Dissertations > Cheminformatics Modeling of Diverse and Disparate Biological Data and the Use of Models to Discover Novel Bioactive Molecules

Ligand-based drug design is a popular and efficient computational approach to facilitate the drug discovery process. Current approaches mainly focus on optimizing the computational algorithms to improve the efficiency or accuracy of virtual screening; however, the success of ligand-based drug design relies not only on the effectiveness and robustness of the underlying algorithms, but much more importantly, on the quality of the data for model building. Although numerous chemical probe databases have emerged recently, few evaluation of data quality and reliability have been performed. Building upon our lab's experience in Quantitative Structure-Activity Relationship (QSAR) method and methods developed in the field of cheminformatics, this dissertation focuses on: 1) Investigation and comparison of the predictive power of QSAR methods with other ligand-based drug discovery approaches, such as Similarity Ensemble Approach (SEA) and Prediction of Activity Spectra for Substances (PASS); 2) Using QSAR methods to validate the consistency and reliability of biomedical data in disparate data sources. 3) Developing a novel, rigorous and dataset-specific QSAR workflow for the application on multiple therapeutic targets in order to identify diverse hits with high potency in practical virtual screening projects. These works succeed in thoroughly investigating the current approaches for ligand-based drug discovery, exploring the consistency and quality of major annotated cheminformatics databases, and identifying many pharmaceutically important ligands. The success of our studies harshly challenges some popular multi-target profile prediction methods and contributes to the development of cheminformatics by emphasizing the importance of determining trustworthy data sources.