Vertical integration of multiple high-dimensional datasets Public Deposited
- Last Modified
- March 21, 2019
- Creator
-
Lock, Eric F.
- Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research
- Abstract
- Research in genomics and related fields now often requires the analysis of emph{multi-block} data, in which multiple high-dimensional types of data are available for a common set of objects. We introduce Joint and Individual Variation Explained (JIVE), a general decomposition of variation for the integrated analysis of multi-block datasets. The decomposition consists of three terms: a low-rank approximation capturing joint variation across datatypes, low-rank approximations for structured variation individual to each datatype, and residual noise. JIVE quantifies the amount of joint variation between datatypes, reduces the dimensionality of the data, and allows for the visual exploration of joint and individual structure. JIVE is an extension of Principal Components Analysis and has clear advantages over popular two-block methods such as Canonical Correlation and Partial Least Squares. Research in a number of fields also requires the analysis of emph{multi-way data}. Multi-way data take the form of a three (or higher) dimensional array. We compare several existing factorization methods for multi-way data, and we show that these methods belong to the same unified framework. The final portion of this dissertation concerns biclustering. We introduce an approach to biclustering a binary data matrix, and discuss the application of biclustering to classification problems.
- Date of publication
- May 2012
- DOI
- Resource type
- Rights statement
- In Copyright
- Note
- ... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Statistics and Operations Research.
- Advisor
- Nobel, Andrew
- Degree granting institution
- University of North Carolina at Chapel Hill
- Language