Methods of Association Mining by Variable-to-Set Affinity Testing

Bodwin, Kelly

Download PDF

Request Version for Screen Reader

Last Modified

March 20, 2019

Creator

Bodwin, Kelly
- Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research

Abstract

Statistical data mining refers to methods for identifying and validating interesting patterns from an overabundance of data. Data mining tasks in which the objective involves pairwise relationships between variables are known as association mining. In general, features sought by association mining methods are sets of variables, often small subsets of a larger collection, that are more associated internally than externally. Methods vary in both the measure of association that is studied and the algorithm by which associated sets are identified. This dissertation discusses provide a generalized framework for association mining called Variable-to-Set Affinity Testing (VSAT). Unlike conventional techniques for clustering or community detection, which usually maximize a score from a dissimilarity or adjacency matrix, the VSAT approach is an adaptive procedure grounded in statistical hypothesis testing principles. The framework is adaptable to a broad class of measurements for variable relationships, and is equipped with theoretical guarantees of error control. This dissertation also presents in detail two new association mining methods built in the VSAT framework. The first, Differential Correlation Mining (DCM), identifies variable sets that have higher average pairwise correlation in one sample condition than in another. Such artifacts are of scientific interest in many fields, including statistical genetics and neuroscience. Differential Correlation Mining is applied to high-dimensional data sets in these two fields. The second method, Coherent Set Mining (CSM), is a novel approach to association mining in binary data. Dichotomous observations are assumed to derive from a latent variable of interest via thresholding. The Coherent Set Mining method identifies variable sets that are strongly associated in the latent measure, despite distortions in the association structure of the observed data due to the thresholding process. Coherent Set Mining is applied to problems in text mining, statistical genetics, and product recommendation.

Date of publication

May 2017

Keyword

DOI

https://doi.org/10.17615/6s5a-bp25

Resource type

Dissertation

Rights statement

In Copyright

Advisor

Nobel, Andrew
Marron, James Stephen
Bhamidi, Shankar
Zhang, Kai
Xia, Yin

Degree

Doctor of Philosophy

Degree granting institution

University of North Carolina at Chapel Hill Graduate School

Graduation year

2017

Language

English

Relations

Parents:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	Bodwin_unc_0153D_17074.pdf	2019-04-12	Public	Download

Methods of Association Mining by Variable-to-Set Affinity Testing

Downloadable Content

Relations

Items