Statistical integration of information Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
  • Feng, Qing
    • Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research
  • Modern data analysis frequently involves multiple large and diverse data sets generated from current high-throughput technologies. An integrative analysis of these sources of information is very promising for improving knowledge discovery in various fields. This dissertation focuses on three distinct challenges in the integration of information. The variables obtained from diverse and novel platforms often have highly non-Gaussian marginal distributions and therefore are challenging to analyze by commonly used methods. The first part introduces an automatic transformation for improving data quality before integrating multiple data sources. For each variable, a new family of parametrizations of the shifted logarithm transformation is proposed, which allows transformation for both left and right skewness within the single family and an automatic selection of the parameter value. The second part discusses an integrative analysis of disparate data blocks measured on a common set of experimental subjects. This data integration naturally motivates the simultaneous exploration of the joint and individual variation within each data block resulting in new insights. We introduce Non-iterative Joint and Individual Variation Explained (Non-iterative JIVE), capturing both joint and individual variation within each data block. This is a major improvement over earlier approaches to this challenge in terms of both a new conceptual understanding and a fast linear algebra computation. An important mathematical contribution is the use of score subspaces as the principal descriptors of variation structure and the use of perturbation theory as the guide for variation segmentation. Furthermore, this makes our method robust against the heterogeneity among data blocks, without a need for normalization. The last part proposes a Generalized Fiducial Inference inspired method for finding a robust consensus among several independently derived confidence distributions (CDs) for a quantity of interest. The resulting fused CD is robust to the existence of potentially discrepant CDs in the collection. The method uses computationally efficient fiducial model averaging to obtain a robust consensus distribution without the need to eliminate discrepant CDs from the analysis.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Bhamidi, Shankar
  • Nobel, Andrew
  • Marron, James Stephen
  • Liu, Yufeng
  • Hoadley, Katherine
  • Hannig, Jan
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2016

This work has no parents.