ingest cdrApp 2017-08-15T21:26:07.772Z d91e81c8-5a8a-4e8a-976c-cad4e396e5ee modifyDatastreamByValue RELS-EXT fedoraAdmin 2017-08-15T21:26:52.715Z Setting exclusive relation modifyDatastreamByValue RELS-EXT fedoraAdmin 2017-08-15T21:27:01.864Z Setting exclusive relation addDatastream MD_TECHNICAL fedoraAdmin 2017-08-15T21:27:10.919Z Adding technical metadata derived by FITS modifyDatastreamByValue RELS-EXT fedoraAdmin 2017-08-15T21:27:28.822Z Setting exclusive relation addDatastream MD_FULL_TEXT fedoraAdmin 2017-08-15T21:27:41.350Z Adding full text metadata extracted by Apache Tika modifyDatastreamByValue RELS-EXT fedoraAdmin 2017-08-15T21:27:42.280Z Setting exclusive relation modifyDatastreamByValue RELS-EXT cdrApp 2017-08-22T13:57:45.430Z Setting exclusive relation modifyDatastreamByValue MD_DESCRIPTIVE cdrApp 2018-01-25T16:54:47.478Z modifyDatastreamByValue MD_DESCRIPTIVE cdrApp 2018-01-27T16:39:10.626Z modifyDatastreamByValue MD_DESCRIPTIVE cdrApp 2018-03-14T14:18:56.515Z modifyDatastreamByValue MD_DESCRIPTIVE cdrApp 2018-05-18T16:51:50.070Z modifyDatastreamByValue MD_DESCRIPTIVE cdrApp 2018-07-11T13:00:20.525Z modifyDatastreamByValue MD_DESCRIPTIVE cdrApp 2018-07-18T08:47:53.863Z modifyDatastreamByValue MD_DESCRIPTIVE cdrApp 2018-08-17T14:43:55.687Z modifyDatastreamByValue MD_DESCRIPTIVE cdrApp 2018-08-21T17:36:35.922Z modifyDatastreamByValue MD_DESCRIPTIVE cdrApp 2018-09-27T17:45:53.438Z modifyDatastreamByValue MD_DESCRIPTIVE cdrApp 2018-10-12T08:53:45.446Z modifyDatastreamByValue MD_DESCRIPTIVE cdrApp 2018-10-17T14:11:06.100Z modifyDatastreamByValue MD_DESCRIPTIVE cdrApp 2019-03-21T18:55:07.304Z Jingxiang Chen Author Department of Biostatistics Gillings School of Global Public Health Machine Learning Techniques for Heterogeneous Data Sets Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy. Spring 2017 2017 Public health Biostatistics Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection eng Doctor of Philosophy Dissertation University of North Carolina at Chapel Hill Graduate School Degree granting institution Biostatistics Yufeng Liu Thesis advisor Michael Kosorok Thesis advisor Stephen Cole Thesis advisor Eric Laber Thesis advisor Donglin Zeng Thesis advisor text Jingxiang Chen Creator Department of Biostatistics Gillings School of Global Public Health Machine Learning Techniques for Heterogeneous Data Sets Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy. Spring 2017 2017 Public health Biostatistics Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection eng Doctor of Philosophy Dissertation University of North Carolina at Chapel Hill Graduate School Degree granting institution Biostatistics Yufeng Liu Thesis advisor Michael Kosorok Thesis advisor Stephen Cole Thesis advisor Eric Laber Thesis advisor Donglin Zeng Thesis advisor text Jingxiang Chen Creator Department of Biostatistics Gillings School of Global Public Health Machine Learning Techniques for Heterogeneous Data Sets Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy. Spring 2017 2017 Public health Biostatistics Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection eng Doctor of Philosophy Dissertation University of North Carolina at Chapel Hill Graduate School Degree granting institution Biostatistics Yufeng Liu Thesis advisor Michael Kosorok Thesis advisor Stephen Cole Thesis advisor Eric Laber Thesis advisor Donglin Zeng Thesis advisor text Jingxiang Chen Creator Department of Biostatistics Gillings School of Global Public Health Machine Learning Techniques for Heterogeneous Data Sets Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy. 2017-05 2017 Public health Biostatistics Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection eng Doctor of Philosophy Dissertation University of North Carolina at Chapel Hill Graduate School Degree granting institution Biostatistics Yufeng Liu Thesis advisor Michael Kosorok Thesis advisor Stephen Cole Thesis advisor Eric Laber Thesis advisor Donglin Zeng Thesis advisor text Jingxiang Chen Creator Department of Biostatistics Gillings School of Global Public Health Machine Learning Techniques for Heterogeneous Data Sets Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy. 2017 Public health Biostatistics Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection eng Doctor of Philosophy Dissertation University of North Carolina at Chapel Hill Graduate School Degree granting institution Biostatistics Yufeng Liu Thesis advisor Michael Kosorok Thesis advisor Stephen Cole Thesis advisor Eric Laber Thesis advisor Donglin Zeng Thesis advisor text 2017-05 Jingxiang Chen Creator Department of Biostatistics Gillings School of Global Public Health Machine Learning Techniques for Heterogeneous Data Sets Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy. 2017 Public health Biostatistics Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection eng Doctor of Philosophy Dissertation University of North Carolina at Chapel Hill Graduate School Degree granting institution Biostatistics Yufeng Liu Thesis advisor Michael Kosorok Thesis advisor Stephen Cole Thesis advisor Eric Laber Thesis advisor Donglin Zeng Thesis advisor text 2017-05 Jingxiang Chen Creator Department of Biostatistics Gillings School of Global Public Health Machine Learning Techniques for Heterogeneous Data Sets Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy. 2017 Public health Biostatistics Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection eng Doctor of Philosophy Dissertation University of North Carolina at Chapel Hill Graduate School Degree granting institution Biostatistics Yufeng Liu Thesis advisor Michael Kosorok Thesis advisor Stephen Cole Thesis advisor Eric Laber Thesis advisor Donglin Zeng Thesis advisor text 2017-05 Jingxiang Chen Creator Department of Biostatistics Gillings School of Global Public Health Machine Learning Techniques for Heterogeneous Data Sets Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy. 2017 Public health Biostatistics Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection eng Doctor of Philosophy Dissertation Biostatistics Yufeng Liu Thesis advisor Michael Kosorok Thesis advisor Stephen Cole Thesis advisor Eric Laber Thesis advisor Donglin Zeng Thesis advisor text 2017-05 University of North Carolina at Chapel Hill Degree granting institution Jingxiang Chen Creator Department of Biostatistics Gillings School of Global Public Health Machine Learning Techniques for Heterogeneous Data Sets Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy. 2017 Public health Biostatistics Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection eng Doctor of Philosophy Dissertation Biostatistics Yufeng Liu Thesis advisor Michael Kosorok Thesis advisor Stephen Cole Thesis advisor Eric Laber Thesis advisor Donglin Zeng Thesis advisor text 2017-05 University of North Carolina at Chapel Hill Degree granting institution Jingxiang Chen Creator Department of Biostatistics Gillings School of Global Public Health Machine Learning Techniques for Heterogeneous Data Sets Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy. 2017 Public health Biostatistics Big Data; Latent Supervised Learning; Nonparametric Statsitics; Precision Medicine; Subgroup Identification; Varaible Selection eng Doctor of Philosophy Dissertation Biostatistics Yufeng Liu Thesis advisor Michael Kosorok Thesis advisor Stephen Cole Thesis advisor Eric Laber Thesis advisor Donglin Zeng Thesis advisor text 2017-05 University of North Carolina at Chapel Hill Degree granting institution Jingxiang Chen Creator Department of Biostatistics Gillings School of Global Public Health Machine Learning Techniques for Heterogeneous Data Sets Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy. 2017 Public health Biostatistics Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection eng Doctor of Philosophy Dissertation University of North Carolina at Chapel Hill Graduate School Degree granting institution Biostatistics Yufeng Liu Thesis advisor Michael Kosorok Thesis advisor Stephen Cole Thesis advisor Eric Laber Thesis advisor Donglin Zeng Thesis advisor text 2017-05 Jingxiang Chen Creator Department of Biostatistics Gillings School of Global Public Health Machine Learning Techniques for Heterogeneous Data Sets Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy. 2017 Public health Biostatistics Big Data, Latent Supervised Learning, Nonparametric Statsitics, Precision Medicine, Subgroup Identification, Varaible Selection eng Doctor of Philosophy Dissertation Biostatistics Yufeng Liu Thesis advisor Michael Kosorok Thesis advisor Stephen Cole Thesis advisor Eric Laber Thesis advisor Donglin Zeng Thesis advisor text 2017-05 University of North Carolina at Chapel Hill Degree granting institution Jingxiang Chen Creator Department of Biostatistics Gillings School of Global Public Health Machine Learning Techniques for Heterogeneous Data Sets Over the past few decades, machine learning tools are under rapid development in various application fields to support statistical decision making. In this dissertation, we aim at investigating new supervised machine learning techniques which can contribute to analysis of complex datasets. First, we discuss a new learning method under Reproducing Kernel Hilbert Spaces (RKHS) to achieve variable selection and data extraction simultaneously. In particular, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Numerical study results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning. Second, we study on how machine learning can be applied to heterogeneous data analysis by detecting an optimal individual treatment rule for the ordinal treatment case. One of the primary goals in precision medicine is to obtain an optimal individual treatment rule (ITR). Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for the ordinal treatment settings such as dose level finding, it is unclear how to use OWL. We propose a new technique for estimating ITR with ordinal treatments. Simulated examples and an application to a type-2 diabetes study demonstrate the highly competitive performance of the proposed method. Third, we also focus on analyzing the heterogeneous data but in a different point of view. In particular, we develop a new exploratory machine learning tool to identify the heterogeneous subpopulations without much prior knowledge. To achieve this goal, we formulate a regression problem with subject specific regression coefficients and use adaptive fusion to cluster the coefficients into subpopulations. This method has two main advantages. First, it relies on little prior knowledge on the underlying subpopulation structure. Second, it makes use of the outcome-predictor relationship and hence can have competitive estimation and prediction accuracy. To estimate the parameters, we design a highly efficient accelerated proximal gradient algorithm. Numerical studies show that the proposed method has competitive estimation and prediction accuracy. 2017 Public health Biostatistics Big Data; Latent Supervised Learning; Nonparametric Statsitics; Precision Medicine; Subgroup Identification; Varaible Selection eng Doctor of Philosophy Dissertation Yufeng Liu Thesis advisor Michael Kosorok Thesis advisor Stephen Cole Thesis advisor Eric Laber Thesis advisor Donglin Zeng Thesis advisor text 2017-05 University of North Carolina at Chapel Hill Degree granting institution Chen_unc_0153D_17154.pdf uuid:70d9a95c-9c7b-4bfc-97ef-b31aef91ece8 2019-08-15T00:00:00 proquest 2017-06-17T14:37:56Z application/pdf 2117489 yes