Gaussian Centered L-moments Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
Creator
  • An, Hyowon
    • Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research
Abstract
  • As various types of media currently generate so-called big data, data visualization faces the challenge of selecting a few representative variables that best summarize important structure inherent in data. The conventional moments-based summary statistics can be useful for the purpose of variable screening. In particular, they can find important distributional features such as bimodality and skewness. However, their sensitivity to outliers can lead to selection based on a few extreme outliers rather than distributional shape. To address this type of non-robustness, we consider the L-moments. But, describing a marginal distribution with the L-moments has an intuitive limitation in practice because these moments take zero values at the uniform distribution; the interest usually lies in the shape of the marginal distribution compared to the Gaussian, but the sign and magnitude of the L-moments are not as useful as expected for this purpose. As a remedy, we propose the Gaussian Centered L-moments with zeros at the Gaussian distribution while sharing robustness of the L-moments. The Gaussian Centered L-moments can be especially useful for gene expression data in which variable screening corresponds to finding biologically meaningful genes. The mixtures of Gaussian distributions seems to be underlying mechanism generating gene expression profiles, and this suggests moments that are sensitive to departure from Gaussianity. This dissertation deeply investigates theoretical properties of the Gaussian Centered L-moments in various ways. First, by the means of Oja's criteria, the first four terms of the Gaussian Centered L-moments are shown to describe the shape of a distribution in a physically meaningful fashion. Second, comparison between robustness of the conventional, L- and Gaussian Centered L-moments is made based on asymptotic behavior of their influence functions on Tukey's h distributions. Third, the efficiencies of these moments in capturing departure from Gaussianity are compared by developing Jarque-Bera type goodness-of-fit test statistics for Gaussianity. While developing such test statistics, a method for obtaining optimal balance between skewness and kurtosis estimators is introduced. Finally, comprehensive performances including both the robustness and efficiency of the different moments on high dimensional gene expression data are analyzed by the Gene Set Enrichment Analysis.
Date of publication
Keyword
DOI
Resource type
Rights statement
  • In Copyright
Advisor
  • Fraiman, Nicolas
  • Marron, James Stephen
  • Hannig, Jan
  • Zhang, Kai
  • Carlstein, Edward
Degree
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2017
Language
Parents:

This work has no parents.

Items