ingest
cdrApp
2017-08-15T21:26:07.772Z
d91e81c8-5a8a-4e8a-976c-cad4e396e5ee
modifyDatastreamByValue
RELS-EXT
fedoraAdmin
2017-08-15T21:26:52.715Z
Setting exclusive relation
modifyDatastreamByValue
RELS-EXT
fedoraAdmin
2017-08-15T21:27:01.864Z
Setting exclusive relation
addDatastream
MD_TECHNICAL
fedoraAdmin
2017-08-15T21:27:10.919Z
Adding technical metadata derived by FITS
modifyDatastreamByValue
RELS-EXT
fedoraAdmin
2017-08-15T21:27:28.822Z
Setting exclusive relation
addDatastream
MD_FULL_TEXT
fedoraAdmin
2017-08-15T21:27:41.350Z
Adding full text metadata extracted by Apache Tika
modifyDatastreamByValue
RELS-EXT
fedoraAdmin
2017-08-15T21:27:42.280Z
Setting exclusive relation
modifyDatastreamByValue
RELS-EXT
cdrApp
2017-08-22T13:57:45.430Z
Setting exclusive relation
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-01-25T16:54:47.478Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-01-27T16:39:10.626Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-03-14T14:18:56.515Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-05-18T16:51:50.070Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-07-11T13:00:20.525Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-07-18T08:47:53.863Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-08-17T14:43:55.687Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-08-21T17:36:35.922Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-09-27T17:45:53.438Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-10-12T08:53:45.446Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-10-17T14:11:06.100Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2019-03-21T18:55:07.304Z
Jingxiang
Chen
Author
Department of Biostatistics
Gillings School of Global Public Health
Machine Learning Techniques for Heterogeneous Data Sets
Over the past few decades, machine learning tools are under rapid
development in various application fields to support statistical decision
making. In this dissertation, we aim at investigating new supervised
machine learning techniques which can contribute to analysis of complex
datasets.
First, we discuss a new learning method under Reproducing Kernel Hilbert
Spaces (RKHS) to achieve variable selection and data extraction simultaneously.
In particular, we propose a unified RKHS learning method, namely, DOuble
Sparsity Kernel (DOSK) learning, to overcome this challenge.
We prove that under certain conditions, our new method can asymptotically
achieve variable selection consistency. Numerical study results
demonstrate that DOSK is highly competitive among existing approaches
for RKHS learning.
Second, we study on how machine learning can be applied to heterogeneous
data analysis by detecting an optimal individual treatment rule for
the ordinal treatment case. One of the primary goals in precision
medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL)
has been proposed to estimate such an optimal ITR in a binary treatment
setting by maximizing the expected clinical outcome. However, for
the ordinal treatment settings such as dose level finding, it is unclear
how to use OWL. We propose a new technique for estimating ITR with
ordinal treatments. Simulated examples
and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method.
Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy.
Spring 2017
2017
Public health
Biostatistics
Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection
eng
Doctor of Philosophy
Dissertation
University of North Carolina at Chapel Hill Graduate School
Degree granting institution
Biostatistics
Yufeng
Liu
Thesis advisor
Michael
Kosorok
Thesis advisor
Stephen
Cole
Thesis advisor
Eric
Laber
Thesis advisor
Donglin
Zeng
Thesis advisor
text
Jingxiang
Chen
Creator
Department of Biostatistics
Gillings School of Global Public Health
Machine Learning Techniques for Heterogeneous Data Sets
Over the past few decades, machine learning tools are under rapid development
in various application fields to support statistical decision making. In this
dissertation, we aim at investigating new supervised machine learning techniques which can
contribute to analysis of complex datasets. First, we discuss a new learning method under
Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction
simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble
Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain
conditions, our new method can asymptotically achieve variable selection consistency.
Numerical study results demonstrate that DOSK is highly competitive among existing
approaches for RKHS learning. Second, we study on how machine learning can be applied to
heterogeneous data analysis by detecting an optimal individual treatment rule for the
ordinal treatment case. One of the primary goals in precision medicine is to obtain an
optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has
been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing
the expected clinical outcome. However, for the ordinal treatment settings such as dose
level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR
with ordinal treatments. Simulated examples and an application to a type-2 diabetes study
demonstrate the highly competitive performance of the proposed method. Third, we also
focus on analyzing the heterogeneous data but in a different point of view. In particular,
we develop a new exploratory machine learning tool to identify the heterogeneous
subpopulations without much prior knowledge. To achieve this goal, we formulate a
regression problem with subject specific regression coefficients and use adaptive fusion
to cluster the coefficients into subpopulations. This method has two main advantages.
First, it relies on little prior knowledge on the underlying subpopulation structure.
Second, it makes use of the outcome-predictor relationship and hence can have competitive
estimation and prediction accuracy. To estimate the parameters, we design a highly
efficient accelerated proximal gradient algorithm. Numerical studies show that the
proposed method has competitive estimation and prediction accuracy.
Spring 2017
2017
Public health
Biostatistics
Big Data, Latent Supervised Learning, Nonparametric
Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection
eng
Doctor of Philosophy
Dissertation
University of North Carolina at Chapel Hill Graduate School
Degree granting
institution
Biostatistics
Yufeng
Liu
Thesis advisor
Michael
Kosorok
Thesis advisor
Stephen
Cole
Thesis advisor
Eric
Laber
Thesis advisor
Donglin
Zeng
Thesis advisor
text
Jingxiang
Chen
Creator
Department of Biostatistics
Gillings School of Global Public Health
Machine Learning Techniques for Heterogeneous Data Sets
Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy.
Spring 2017
2017
Public health
Biostatistics
Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection
eng
Doctor of Philosophy
Dissertation
University of North Carolina at Chapel Hill Graduate School
Degree granting institution
Biostatistics
Yufeng
Liu
Thesis advisor
Michael
Kosorok
Thesis advisor
Stephen
Cole
Thesis advisor
Eric
Laber
Thesis advisor
Donglin
Zeng
Thesis advisor
text
Jingxiang
Chen
Creator
Department of Biostatistics
Gillings School of Global Public Health
Machine Learning Techniques for Heterogeneous Data Sets
Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy.
2017-05
2017
Public health
Biostatistics
Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection
eng
Doctor of Philosophy
Dissertation
University of North Carolina at Chapel Hill Graduate School
Degree granting institution
Biostatistics
Yufeng
Liu
Thesis advisor
Michael
Kosorok
Thesis advisor
Stephen
Cole
Thesis advisor
Eric
Laber
Thesis advisor
Donglin
Zeng
Thesis advisor
text
Jingxiang
Chen
Creator
Department of Biostatistics
Gillings School of Global Public Health
Machine Learning Techniques for Heterogeneous Data Sets
Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy.
2017
Public health
Biostatistics
Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection
eng
Doctor of Philosophy
Dissertation
University of North Carolina at Chapel Hill Graduate School
Degree granting institution
Biostatistics
Yufeng
Liu
Thesis advisor
Michael
Kosorok
Thesis advisor
Stephen
Cole
Thesis advisor
Eric
Laber
Thesis advisor
Donglin
Zeng
Thesis advisor
text
2017-05
Jingxiang
Chen
Creator
Department of Biostatistics
Gillings School of Global Public Health
Machine Learning Techniques for Heterogeneous Data Sets
Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy.
2017
Public health
Biostatistics
Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection
eng
Doctor of Philosophy
Dissertation
University of North Carolina at Chapel Hill Graduate School
Degree granting institution
Biostatistics
Yufeng
Liu
Thesis advisor
Michael
Kosorok
Thesis advisor
Stephen
Cole
Thesis advisor
Eric
Laber
Thesis advisor
Donglin
Zeng
Thesis advisor
text
2017-05
Jingxiang
Chen
Creator
Department of Biostatistics
Gillings School of Global Public Health
Machine Learning Techniques for Heterogeneous Data Sets
Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy.
2017
Public health
Biostatistics
Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection
eng
Doctor of Philosophy
Dissertation
University of North Carolina at Chapel Hill Graduate School
Degree granting institution
Biostatistics
Yufeng
Liu
Thesis advisor
Michael
Kosorok
Thesis advisor
Stephen
Cole
Thesis advisor
Eric
Laber
Thesis advisor
Donglin
Zeng
Thesis advisor
text
2017-05
Jingxiang
Chen
Creator
Department of Biostatistics
Gillings School of Global Public Health
Machine Learning Techniques for Heterogeneous Data Sets
Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy.
2017
Public health
Biostatistics
Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection
eng
Doctor of Philosophy
Dissertation
Biostatistics
Yufeng
Liu
Thesis advisor
Michael
Kosorok
Thesis advisor
Stephen
Cole
Thesis advisor
Eric
Laber
Thesis advisor
Donglin
Zeng
Thesis advisor
text
2017-05
University of North Carolina at Chapel Hill
Degree granting institution
Jingxiang
Chen
Creator
Department of Biostatistics
Gillings School of Global Public Health
Machine Learning Techniques for Heterogeneous Data Sets
Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy.
2017
Public health
Biostatistics
Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection
eng
Doctor of Philosophy
Dissertation
Biostatistics
Yufeng
Liu
Thesis advisor
Michael
Kosorok
Thesis advisor
Stephen
Cole
Thesis advisor
Eric
Laber
Thesis advisor
Donglin
Zeng
Thesis advisor
text
2017-05
University of North Carolina at Chapel Hill
Degree granting institution
Jingxiang
Chen
Creator
Department of Biostatistics
Gillings School of Global Public Health
Machine Learning Techniques for Heterogeneous Data Sets
Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy.
2017
Public health
Biostatistics
Big Data; Latent Supervised Learning; Nonparametric Statsitics; Precision Medicine; Subgroup Identification; Varaible Selection
eng
Doctor of Philosophy
Dissertation
Biostatistics
Yufeng
Liu
Thesis advisor
Michael
Kosorok
Thesis advisor
Stephen
Cole
Thesis advisor
Eric
Laber
Thesis advisor
Donglin
Zeng
Thesis advisor
text
2017-05
University of North Carolina at Chapel Hill
Degree granting institution
Jingxiang
Chen
Creator
Department of Biostatistics
Gillings School of Global Public Health
Machine Learning Techniques for Heterogeneous Data Sets
Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy.
2017
Public health
Biostatistics
Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection
eng
Doctor of Philosophy
Dissertation
University of North Carolina at Chapel Hill Graduate School
Degree granting institution
Biostatistics
Yufeng
Liu
Thesis advisor
Michael
Kosorok
Thesis advisor
Stephen
Cole
Thesis advisor
Eric
Laber
Thesis advisor
Donglin
Zeng
Thesis advisor
text
2017-05
Jingxiang
Chen
Creator
Department of Biostatistics
Gillings School of Global Public Health
Machine Learning Techniques for Heterogeneous Data Sets
Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy.
2017
Public health
Biostatistics
Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection
eng
Doctor of Philosophy
Dissertation
Biostatistics
Yufeng
Liu
Thesis advisor
Michael
Kosorok
Thesis advisor
Stephen
Cole
Thesis advisor
Eric
Laber
Thesis advisor
Donglin
Zeng
Thesis advisor
text
2017-05
University of North Carolina at Chapel Hill
Degree granting institution
Jingxiang
Chen
Creator
Department of Biostatistics
Gillings School of Global Public Health
Machine Learning Techniques for Heterogeneous Data Sets
Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy.
2017
Public health
Biostatistics
Big Data; Latent Supervised Learning; Nonparametric Statsitics; Precision Medicine; Subgroup Identification; Varaible Selection
eng
Doctor of Philosophy
Dissertation
Yufeng
Liu
Thesis advisor
Michael
Kosorok
Thesis advisor
Stephen
Cole
Thesis advisor
Eric
Laber
Thesis advisor
Donglin
Zeng
Thesis advisor
text
2017-05
University of North Carolina at Chapel Hill
Degree granting institution
Chen_unc_0153D_17154.pdf
uuid:70d9a95c-9c7b-4bfc-97ef-b31aef91ece8
2019-08-15T00:00:00
proquest
2017-06-17T14:37:56Z
application/pdf
2117489
yes