Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research
This research covers two major areas. The first one is asymptotic properties of Principal Component Analysis (PCA) and sparse PCA. The second one is the application of functional data analysis to tree structured data objects. A general asymptotic framework is developed for studying consistency properties of PCA. Assuming the spike population model, the framework considers increasing sample size, increasing dimension (or the number of variables) and increasing spike sizes (the relative size of the population eigenvalues). Our framework includes several previously studied domains of asymptotics as special cases, and for the first time allows one to investigate interesting connections and transitions among the various domains. This unication provides new theoretical insights. Sparse PCA methods are efficient tools to reduce the dimension (or number of variables) of complex data. Sparse principal components (PCs) can be easier to interpret than conventional PCs, because most loadings are zero. We study the asymptotic properties of these sparse PC directions for scenarios with fixed sample size and increasing dimension (i.e. High Dimension, Low Sample Size (HDLSS)). We find a large set of sparsity assumptions under which sparse PCA is still consistent even when conventional PCA is strongly inconsistent. The consistency of sparse PCA is characterized along with rates of convergence. The boundaries of the consistent region are clarified using an oracle result. Functional data analysis has been very successful in the analysis of data lying in standard Euclidean space, such as curve data. However, with recent developments in fields such as medical image analysis, more and more non-Euclidean spaces, such as tree-structured data, present great challenges to statistical analysis. Here, we use the Dyck path approach from probability theory to build a bridge between tree space and curve space to exploit the power of functional data analysis to analyze data in tree space.