New Statistical Learning Approaches with Applications to RNA-seq Data

Kimes, Patrick

Download PDF

Request Version for Screen Reader

Last Modified

March 19, 2019

Creator

Kimes, Patrick
- Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research

Abstract

This dissertation examines statistical learning problems in both the supervised and unsupervised settings. The dissertation is composed of three major parts. In the first two, we address the important question of significance of clustering, and in the third, we describe a novel framework for unifying hard and soft classification through a spectrum of binary learning problems. In the unsupervised task of clustering, determining whether the identified clusters represent important underlying structure, or are artifacts of natural sampling variation, has been a critical and challenging question. In this dissertation, we introduce two new methods for addressing this question using statistical significance. In the first part of the dissertation, we describe SigFuge, an approach for identifying genomic loci exhibiting differential transcription patterns across many RNA-seq samples. In the second part of this dissertation, we describe statistical Significance of Hierarchical Clustering (SHC), a Monte Carlo based approach for testing significance in hierarchical clustering, and demonstrate the power of the method to identify significant clustering using two cancer gene expression datasets. Both methods were implemented and made available as open source packages in R. In the final part of this dissertation, we propose a spectrum of supervised learning problems which spans the hard and soft classification tasks based on fitting multiple decision rules to a dataset. By doing so, we reveal a novel collection of binary supervised learning problems. We study the problems using the framework of large-margin classification and a class of piecewise linear surrogate losses, for which we derive statistical properties. We evaluate our approach using simulations and a magnetic resonance imaging (MRI) dataset from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study.

Date of publication

August 2015

Keyword

Subject

DOI

https://doi.org/10.17615/pht4-y031

Identifier

Kimes_unc_0153D_15578.pdf

Resource type

Dissertation

Rights statement

In Copyright

Advisor

Liu, Yufeng
Hayes, David
Hannig, Jan
Marron, James Stephen
Zhang, Kai

Degree

Doctor of Philosophy

Degree granting institution

University of North Carolina at Chapel Hill Graduate School

Graduation year

2015

Language

English

Publisher

University of North Carolina at Chapel Hill Graduate School

Place of publication

Chapel Hill, NC

Access right

There are no restrictions to this item.

Date uploaded

August 25, 2015

Relations

Parents:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	Kimes_unc_0153D_15578.pdf	2019-04-09	Public	Download

New Statistical Learning Approaches with Applications to RNA-seq Data

Downloadable Content

Relations

Items