Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research
Supervised learning techniques have been widely used in diverse scientific disciplines such as biology and neuroscience. Among the existing supervised learning techniques, penalized regression is a very popular one, partly due to its simple formulation and good performance in practice. Despite the success of this technique, many challenges remain. The first challenge is how to develop new methods that could incorporate the structure/correlation information among predictors efficiently. Moreover, in many practical applications such as computational neuroscience, we need to predict multiple correlated responses (e.g., class label and clinical scores). It is very important to study new techniques to predict those correlated responses jointly, using not only the correlation information among responses but also the structure/correlation information among predictors. Furthermore, in modern scientific research, many data sets are collected from different modalities (sources or types). Since the observations of a certain modality can be missing completely, block-missing multi-modality data are very common. Flexible and efficient statistical methods applicable to block-missing multi-modality data require careful study. In this dissertation, we propose several new supervised learning techniques to overcome the challenges mentioned above. Both numerical and theoretical studies are presented to demonstrate the effectiveness of our proposed methods. Practical applications of these methods using the Alzheimer's Disease Neuroimaging Initiative (ADNI) data set are provided as well.