Sparse PCA Asymptotics and Analysis of Tree Data

Shen, Dan

Download PDF

Request Version for Screen Reader

Last Modified

March 20, 2019

Creator

Shen, Dan
- Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research

Abstract

This research covers two major areas. The first one is asymptotic properties of Principal Component Analysis (PCA) and sparse PCA. The second one is the application of functional data analysis to tree structured data objects. A general asymptotic framework is developed for studying consistency properties of PCA. Assuming the spike population model, the framework considers increasing sample size, increasing dimension (or the number of variables) and increasing spike sizes (the relative size of the population eigenvalues). Our framework includes several previously studied domains of asymptotics as special cases, and for the first time allows one to investigate interesting connections and transitions among the various domains. This unication provides new theoretical insights. Sparse PCA methods are efficient tools to reduce the dimension (or number of variables) of complex data. Sparse principal components (PCs) can be easier to interpret than conventional PCs, because most loadings are zero. We study the asymptotic properties of these sparse PC directions for scenarios with fixed sample size and increasing dimension (i.e. High Dimension, Low Sample Size (HDLSS)). We find a large set of sparsity assumptions under which sparse PCA is still consistent even when conventional PCA is strongly inconsistent. The consistency of sparse PCA is characterized along with rates of convergence. The boundaries of the consistent region are clarified using an oracle result. Functional data analysis has been very successful in the analysis of data lying in standard Euclidean space, such as curve data. However, with recent developments in fields such as medical image analysis, more and more non-Euclidean spaces, such as tree-structured data, present great challenges to statistical analysis. Here, we use the Dyck path approach from probability theory to build a bridge between tree space and curve space to exploit the power of functional data analysis to analyze data in tree space.

Date of publication

August 2012

Keyword

DOI

https://doi.org/10.17615/40jf-4g75

Resource type

Dissertation

Rights statement

In Copyright

Advisor

Kosorok, Michael
Marron, James Stephen
Bhamidi, Shankar
Shen, Haipeng
Nobel, Andrew

Degree

Doctor of Philosophy

Degree granting institution

University of North Carolina at Chapel Hill Graduate School

Graduation year

2012

Language

English

Access right

This item is restricted from public view for 1 year after publication.

Date uploaded

April 26, 2017

Relations

Parents:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	Shen_unc_0153D_12982.pdf	2019-04-10	Public	Download

Sparse PCA Asymptotics and Analysis of Tree Data

Downloadable Content

Relations

Items