On High Dimensional Sparse Regression and Its Inference

Sun, Qiang

Download PDF

Request Version for Screen Reader

Last Modified

March 19, 2019

Creator

Sun, Qiang
- Affiliation: Gillings School of Global Public Health, Department of Biostatistics

Abstract

In the first part of this work, we aim to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling's $T^2$ test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM outperforms other state-of-the-art methods. In the second part of this work, we propose a Hard Thresholded Regression (HTR) framework for simultaneous variable selection and unbiased estimation in high dimensional linear regression. This new framework is motivated by its close connection with the $L_0$ regularization and best subset selection under orthogonal design, while enjoying several key computational and theoretical advantages over many existing penalization methods (e.g., SCAD or MCP). Computationally, HTR is a fast two-stage estimation procedure consisting of the first step for calculating a coarse initial estimator and the second step for solving a linear program. Theoretically, under some mild conditions, the HTR estimator is shown to enjoy the strong oracle property and thresholded property even when the number of covariates may grow at an exponential rate. We also propose to incorporate the regularized covariance estimator into the estimation procedure in order to better trade off between noise accumulation and correlation modeling. Under this scenario with regularized covariance matrix, HTR includes Sure Independence Screening as a special case. Both simulation and real data results show that HTR outperforms other state-of-the-art methods. In the third part of this work, we focus on multicategory classification and propose the sparse multicategory discriminant analysis. Many supervised machine learning tasks can be cast as multicategory classification problems. Linear discriminant analysis has been well studied in two class classification problems and can be easily extended to multicatigory cases. For high dimensional classification, traditional linear discriminant analysis fails due to diverging spectra and accumulation of noise. Therefore, researchers have proposed penalized LDA (Fan et al., 2012; Witten and Tibshirani, 2011). However, most available methods for high dimensional multi-class LDA are based on an iterative algorithm, which is computationally expensive and not theoretically justified. In this paper, we present a new framework for sparse multicategory discriminant analysis (SMDA) for high dimensional multi-class classification by simultaneous extracting the discriminant directions. Our SMDA can be cast as an convex programming which distinguishes itself from other state-of-the-art method. We evaluate the performances of the resulting methods on the extensive simulation study and a real data analysis.

Date of publication

May 2014

Keyword

Subject

DOI

https://doi.org/10.17615/1zkp-rn73

Identifier

Sun_unc_0153D_14737.pdf

Resource type

Dissertation

Rights statement

In Copyright

Advisor

Zeng, Donglin
Zhu, Hongtu
Liu, Yufeng
An, Hongyu
Ibrahim, Joseph
Kosorok, Michael

Degree

Doctor of Philosophy

Degree granting institution

University of North Carolina at Chapel Hill Graduate School

Graduation year

2014

Language

English

Publisher

University of North Carolina at Chapel Hill Graduate School

Place of publication

Chapel Hill, NC

Access right

This item is restricted from public view for 2 years after publication.

Date uploaded

April 22, 2015

Relations

Parents:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	Sun_unc_0153D_14737.pdf	2019-04-12	Public	Download

On High Dimensional Sparse Regression and Its Inference

Downloadable Content

Relations

Items