ingest
cdrApp
2018-03-15T16:02:11.114Z
d591f2cd-3da7-4b31-9dd8-ee27dcb6a3ee
modifyDatastreamByValue
RELS-EXT
fedoraAdmin
2018-03-15T16:03:00.313Z
Setting exclusive relation
addDatastream
MD_TECHNICAL
fedoraAdmin
2018-03-15T16:03:11.502Z
Adding technical metadata derived by FITS
addDatastream
MD_FULL_TEXT
fedoraAdmin
2018-03-15T16:03:35.159Z
Adding full text metadata extracted by Apache Tika
modifyDatastreamByValue
RELS-EXT
fedoraAdmin
2018-03-15T16:03:56.981Z
Setting exclusive relation
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-05-17T18:49:59.964Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-07-11T05:40:01.677Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-07-18T01:54:14.303Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-08-16T15:05:41.943Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-09-27T01:39:10.052Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2018-10-12T02:07:40.477Z
modifyDatastreamByValue
MD_DESCRIPTIVE
cdrApp
2019-03-20T20:25:49.178Z
Tianxiang
Gao
Author
Department of Computer Science
College of Arts and Sciences
Extracting information from deep learning models for computational biology
The advances in deep learning technologies in this decade are providing powerful tools for many machine learning tasks. Deep learning models, in contrast to traditional linear models, can learn nonlinear functions and high-order features, which enable exceptional performance. In the field of computational biology, the rapid growth of data scale and complexity increases the demand for building powerful deep learning based tools. Despite the success of using the deep learning methods, understanding of the reasons for the effectiveness and interpretation of models remain elusive.
This dissertation aims to provide several different approaches to extract information from deep models. This information could be used to address the problems of model complexity and model interpretability.
The amount of data needed to train a model depends on the complexity of the model. The cost of generating data in biology is typically large. Hence, collecting the data on the scale comparable to other deep learning application areas, such as computer vision and speech understanding, is prohibitively expensive and datasets are, consequently, small. Training models of high complexity on small datasets can result in overfitting -- model tries to over-explain the observed data and has a bad prediction accuracy on unobserved data. The number of parameters in the model is often regarded as the complexity of the model. However, deep learning models usually have thousands to millions of parameters, and they are still capable of yielding meaningful results and avoiding over-fitting even on modest datasets. To explain this phenomenon, I proposed a method to estimate the degrees of freedom -- a proper estimate of the complexity -- in deep learning models. My results show that the actual complexity of a deep learning model is much smaller than its number of parameters. Using this measure of complexity, I propose a new model selection score which obviates the need for cross-validation.
Another concern for deep learning models is the ability to extract comprehensible knowledge from the model. In linear models, a coefficient corresponding to an input variable represents that variable’s influence on the prediction. However, in a deep neural network, the relationship between input and output is much more complex. In biological and medical applications, lack of interpretability prevents deep neural networks from being a source of new scientific knowledge. To address this problem, I provide 1) a framework to select hypotheses about perturbations that lead to the largest phenotypic change, and 2) a novel auto-encoder with guided-training that selects a representation of a biological system informative of a target phenotype. Computational biology application case studies were provided to illustrate the success of both methods.
Winter 2017
2017
Computer science
deep learning, hypotheses generation, informative representation, interpretable learning, machine learning, model complexity
eng
Doctor of Philosophy
Dissertation
University of North Carolina at Chapel Hill Graduate School
Degree granting institution
Computer Science
Vladimir
Jojic
Thesis advisor
Jeff
Dangl
Thesis advisor
Leonard
McMillan
Thesis advisor
Marc
Niethammer
Thesis advisor
Mohit
Bansal
Thesis advisor
text
Tianxiang
Gao
Creator
Department of Computer Science
College of Arts and Sciences
Extracting information from deep learning models for computational biology
The advances in deep learning technologies in this decade are providing powerful tools for many machine learning tasks. Deep learning models, in contrast to traditional linear models, can learn nonlinear functions and high-order features, which enable exceptional performance. In the field of computational biology, the rapid growth of data scale and complexity increases the demand for building powerful deep learning based tools. Despite the success of using the deep learning methods, understanding of the reasons for the effectiveness and interpretation of models remain elusive.
This dissertation aims to provide several different approaches to extract information from deep models. This information could be used to address the problems of model complexity and model interpretability.
The amount of data needed to train a model depends on the complexity of the model. The cost of generating data in biology is typically large. Hence, collecting the data on the scale comparable to other deep learning application areas, such as computer vision and speech understanding, is prohibitively expensive and datasets are, consequently, small. Training models of high complexity on small datasets can result in overfitting -- model tries to over-explain the observed data and has a bad prediction accuracy on unobserved data. The number of parameters in the model is often regarded as the complexity of the model. However, deep learning models usually have thousands to millions of parameters, and they are still capable of yielding meaningful results and avoiding over-fitting even on modest datasets. To explain this phenomenon, I proposed a method to estimate the degrees of freedom -- a proper estimate of the complexity -- in deep learning models. My results show that the actual complexity of a deep learning model is much smaller than its number of parameters. Using this measure of complexity, I propose a new model selection score which obviates the need for cross-validation.
Another concern for deep learning models is the ability to extract comprehensible knowledge from the model. In linear models, a coefficient corresponding to an input variable represents that variable’s influence on the prediction. However, in a deep neural network, the relationship between input and output is much more complex. In biological and medical applications, lack of interpretability prevents deep neural networks from being a source of new scientific knowledge. To address this problem, I provide 1) a framework to select hypotheses about perturbations that lead to the largest phenotypic change, and 2) a novel auto-encoder with guided-training that selects a representation of a biological system informative of a target phenotype. Computational biology application case studies were provided to illustrate the success of both methods.
2017-12
2017
Computer science
deep learning, hypotheses generation, informative representation, interpretable learning, machine learning, model complexity
eng
Doctor of Philosophy
Dissertation
University of North Carolina at Chapel Hill Graduate School
Degree granting institution
Computer Science
Vladimir
Jojic
Thesis advisor
Jeff
Dangl
Thesis advisor
Leonard
McMillan
Thesis advisor
Marc
Niethammer
Thesis advisor
Mohit
Bansal
Thesis advisor
text
Tianxiang
Gao
Creator
Department of Computer Science
College of Arts and Sciences
Extracting information from deep learning models for computational biology
The advances in deep learning technologies in this decade are providing powerful tools for many machine learning tasks. Deep learning models, in contrast to traditional linear models, can learn nonlinear functions and high-order features, which enable exceptional performance. In the field of computational biology, the rapid growth of data scale and complexity increases the demand for building powerful deep learning based tools. Despite the success of using the deep learning methods, understanding of the reasons for the effectiveness and interpretation of models remain elusive.
This dissertation aims to provide several different approaches to extract information from deep models. This information could be used to address the problems of model complexity and model interpretability.
The amount of data needed to train a model depends on the complexity of the model. The cost of generating data in biology is typically large. Hence, collecting the data on the scale comparable to other deep learning application areas, such as computer vision and speech understanding, is prohibitively expensive and datasets are, consequently, small. Training models of high complexity on small datasets can result in overfitting -- model tries to over-explain the observed data and has a bad prediction accuracy on unobserved data. The number of parameters in the model is often regarded as the complexity of the model. However, deep learning models usually have thousands to millions of parameters, and they are still capable of yielding meaningful results and avoiding over-fitting even on modest datasets. To explain this phenomenon, I proposed a method to estimate the degrees of freedom -- a proper estimate of the complexity -- in deep learning models. My results show that the actual complexity of a deep learning model is much smaller than its number of parameters. Using this measure of complexity, I propose a new model selection score which obviates the need for cross-validation.
Another concern for deep learning models is the ability to extract comprehensible knowledge from the model. In linear models, a coefficient corresponding to an input variable represents that variable’s influence on the prediction. However, in a deep neural network, the relationship between input and output is much more complex. In biological and medical applications, lack of interpretability prevents deep neural networks from being a source of new scientific knowledge. To address this problem, I provide 1) a framework to select hypotheses about perturbations that lead to the largest phenotypic change, and 2) a novel auto-encoder with guided-training that selects a representation of a biological system informative of a target phenotype. Computational biology application case studies were provided to illustrate the success of both methods.
2017-12
2017
Computer science
deep learning, hypotheses generation, informative representation, interpretable learning, machine learning, model complexity
eng
Doctor of Philosophy
Dissertation
University of North Carolina at Chapel Hill Graduate School
Degree granting institution
Computer Science
Vladimir
Jojic
Thesis advisor
Jeff
Dangl
Thesis advisor
Leonard
McMillan
Thesis advisor
Marc
Niethammer
Thesis advisor
Mohit
Bansal
Thesis advisor
text
Tianxiang
Gao
Creator
Department of Computer Science
College of Arts and Sciences
Extracting information from deep learning models for computational biology
The advances in deep learning technologies in this decade are providing powerful tools for many machine learning tasks. Deep learning models, in contrast to traditional linear models, can learn nonlinear functions and high-order features, which enable exceptional performance. In the field of computational biology, the rapid growth of data scale and complexity increases the demand for building powerful deep learning based tools. Despite the success of using the deep learning methods, understanding of the reasons for the effectiveness and interpretation of models remain elusive.
This dissertation aims to provide several different approaches to extract information from deep models. This information could be used to address the problems of model complexity and model interpretability.
The amount of data needed to train a model depends on the complexity of the model. The cost of generating data in biology is typically large. Hence, collecting the data on the scale comparable to other deep learning application areas, such as computer vision and speech understanding, is prohibitively expensive and datasets are, consequently, small. Training models of high complexity on small datasets can result in overfitting -- model tries to over-explain the observed data and has a bad prediction accuracy on unobserved data. The number of parameters in the model is often regarded as the complexity of the model. However, deep learning models usually have thousands to millions of parameters, and they are still capable of yielding meaningful results and avoiding over-fitting even on modest datasets. To explain this phenomenon, I proposed a method to estimate the degrees of freedom -- a proper estimate of the complexity -- in deep learning models. My results show that the actual complexity of a deep learning model is much smaller than its number of parameters. Using this measure of complexity, I propose a new model selection score which obviates the need for cross-validation.
Another concern for deep learning models is the ability to extract comprehensible knowledge from the model. In linear models, a coefficient corresponding to an input variable represents that variable’s influence on the prediction. However, in a deep neural network, the relationship between input and output is much more complex. In biological and medical applications, lack of interpretability prevents deep neural networks from being a source of new scientific knowledge. To address this problem, I provide 1) a framework to select hypotheses about perturbations that lead to the largest phenotypic change, and 2) a novel auto-encoder with guided-training that selects a representation of a biological system informative of a target phenotype. Computational biology application case studies were provided to illustrate the success of both methods.
2017-12
2017
Computer science
deep learning, hypotheses generation, informative representation, interpretable learning, machine learning, model complexity
eng
Doctor of Philosophy
Dissertation
University of North Carolina at Chapel Hill Graduate School
Degree granting institution
Computer Science
Vladimir
Jojic
Thesis advisor
Jeff
Dangl
Thesis advisor
Leonard
McMillan
Thesis advisor
Marc
Niethammer
Thesis advisor
Mohit
Bansal
Thesis advisor
text
Tianxiang
Gao
Creator
Department of Computer Science
College of Arts and Sciences
Extracting information from deep learning models for computational biology
The advances in deep learning technologies in this decade are providing powerful tools for many machine learning tasks. Deep learning models, in contrast to traditional linear models, can learn nonlinear functions and high-order features, which enable exceptional performance. In the field of computational biology, the rapid growth of data scale and complexity increases the demand for building powerful deep learning based tools. Despite the success of using the deep learning methods, understanding of the reasons for the effectiveness and interpretation of models remain elusive.
This dissertation aims to provide several different approaches to extract information from deep models. This information could be used to address the problems of model complexity and model interpretability.
The amount of data needed to train a model depends on the complexity of the model. The cost of generating data in biology is typically large. Hence, collecting the data on the scale comparable to other deep learning application areas, such as computer vision and speech understanding, is prohibitively expensive and datasets are, consequently, small. Training models of high complexity on small datasets can result in overfitting -- model tries to over-explain the observed data and has a bad prediction accuracy on unobserved data. The number of parameters in the model is often regarded as the complexity of the model. However, deep learning models usually have thousands to millions of parameters, and they are still capable of yielding meaningful results and avoiding over-fitting even on modest datasets. To explain this phenomenon, I proposed a method to estimate the degrees of freedom -- a proper estimate of the complexity -- in deep learning models. My results show that the actual complexity of a deep learning model is much smaller than its number of parameters. Using this measure of complexity, I propose a new model selection score which obviates the need for cross-validation.
Another concern for deep learning models is the ability to extract comprehensible knowledge from the model. In linear models, a coefficient corresponding to an input variable represents that variable’s influence on the prediction. However, in a deep neural network, the relationship between input and output is much more complex. In biological and medical applications, lack of interpretability prevents deep neural networks from being a source of new scientific knowledge. To address this problem, I provide 1) a framework to select hypotheses about perturbations that lead to the largest phenotypic change, and 2) a novel auto-encoder with guided-training that selects a representation of a biological system informative of a target phenotype. Computational biology application case studies were provided to illustrate the success of both methods.
2017-12
2017
Computer science
deep learning, hypotheses generation, informative representation, interpretable learning, machine learning, model complexity
eng
Doctor of Philosophy
Dissertation
Computer Science
Vladimir
Jojic
Thesis advisor
Jeffery L.
Dangl
Thesis advisor
Leonard
McMillan
Thesis advisor
Marc
Niethammer
Thesis advisor
Mohit
Bansal
Thesis advisor
text
University of North Carolina at Chapel Hill
Degree granting institution
Tianxiang
Gao
Creator
Department of Computer Science
College of Arts and Sciences
Extracting information from deep learning models for computational biology
The advances in deep learning technologies in this decade are providing powerful tools for many machine learning tasks. Deep learning models, in contrast to traditional linear models, can learn nonlinear functions and high-order features, which enable exceptional performance. In the field of computational biology, the rapid growth of data scale and complexity increases the demand for building powerful deep learning based tools. Despite the success of using the deep learning methods, understanding of the reasons for the effectiveness and interpretation of models remain elusive.
This dissertation aims to provide several different approaches to extract information from deep models. This information could be used to address the problems of model complexity and model interpretability.
The amount of data needed to train a model depends on the complexity of the model. The cost of generating data in biology is typically large. Hence, collecting the data on the scale comparable to other deep learning application areas, such as computer vision and speech understanding, is prohibitively expensive and datasets are, consequently, small. Training models of high complexity on small datasets can result in overfitting -- model tries to over-explain the observed data and has a bad prediction accuracy on unobserved data. The number of parameters in the model is often regarded as the complexity of the model. However, deep learning models usually have thousands to millions of parameters, and they are still capable of yielding meaningful results and avoiding over-fitting even on modest datasets. To explain this phenomenon, I proposed a method to estimate the degrees of freedom -- a proper estimate of the complexity -- in deep learning models. My results show that the actual complexity of a deep learning model is much smaller than its number of parameters. Using this measure of complexity, I propose a new model selection score which obviates the need for cross-validation.
Another concern for deep learning models is the ability to extract comprehensible knowledge from the model. In linear models, a coefficient corresponding to an input variable represents that variable’s influence on the prediction. However, in a deep neural network, the relationship between input and output is much more complex. In biological and medical applications, lack of interpretability prevents deep neural networks from being a source of new scientific knowledge. To address this problem, I provide 1) a framework to select hypotheses about perturbations that lead to the largest phenotypic change, and 2) a novel auto-encoder with guided-training that selects a representation of a biological system informative of a target phenotype. Computational biology application case studies were provided to illustrate the success of both methods.
2017-12
2017
Computer science
deep learning; hypotheses generation; informative representation; interpretable learning; machine learning; model complexity
eng
Doctor of Philosophy
Dissertation
Computer Science
Vladimir
Jojic
Thesis advisor
Jeffery L.
Dangl
Thesis advisor
Leonard
McMillan
Thesis advisor
Marc
Niethammer
Thesis advisor
Mohit
Bansal
Thesis advisor
text
University of North Carolina at Chapel Hill
Degree granting institution
Tianxiang
Gao
Creator
Department of Computer Science
College of Arts and Sciences
Extracting information from deep learning models for computational biology
The advances in deep learning technologies in this decade are providing powerful tools for many machine learning tasks. Deep learning models, in contrast to traditional linear models, can learn nonlinear functions and high-order features, which enable exceptional performance. In the field of computational biology, the rapid growth of data scale and complexity increases the demand for building powerful deep learning based tools. Despite the success of using the deep learning methods, understanding of the reasons for the effectiveness and interpretation of models remain elusive.
This dissertation aims to provide several different approaches to extract information from deep models. This information could be used to address the problems of model complexity and model interpretability.
The amount of data needed to train a model depends on the complexity of the model. The cost of generating data in biology is typically large. Hence, collecting the data on the scale comparable to other deep learning application areas, such as computer vision and speech understanding, is prohibitively expensive and datasets are, consequently, small. Training models of high complexity on small datasets can result in overfitting -- model tries to over-explain the observed data and has a bad prediction accuracy on unobserved data. The number of parameters in the model is often regarded as the complexity of the model. However, deep learning models usually have thousands to millions of parameters, and they are still capable of yielding meaningful results and avoiding over-fitting even on modest datasets. To explain this phenomenon, I proposed a method to estimate the degrees of freedom -- a proper estimate of the complexity -- in deep learning models. My results show that the actual complexity of a deep learning model is much smaller than its number of parameters. Using this measure of complexity, I propose a new model selection score which obviates the need for cross-validation.
Another concern for deep learning models is the ability to extract comprehensible knowledge from the model. In linear models, a coefficient corresponding to an input variable represents that variable’s influence on the prediction. However, in a deep neural network, the relationship between input and output is much more complex. In biological and medical applications, lack of interpretability prevents deep neural networks from being a source of new scientific knowledge. To address this problem, I provide 1) a framework to select hypotheses about perturbations that lead to the largest phenotypic change, and 2) a novel auto-encoder with guided-training that selects a representation of a biological system informative of a target phenotype. Computational biology application case studies were provided to illustrate the success of both methods.
2017-12
2017
Computer science
deep learning, hypotheses generation, informative representation, interpretable learning, machine learning, model complexity
eng
Doctor of Philosophy
Dissertation
University of North Carolina at Chapel Hill Graduate School
Degree granting institution
Computer Science
Vladimir
Jojic
Thesis advisor
Jeffery L.
Dangl
Thesis advisor
Leonard
McMillan
Thesis advisor
Marc
Niethammer
Thesis advisor
Mohit
Bansal
Thesis advisor
text
Tianxiang
Gao
Creator
Department of Computer Science
College of Arts and Sciences
Extracting information from deep learning models for computational biology
The advances in deep learning technologies in this decade are providing powerful tools for many machine learning tasks. Deep learning models, in contrast to traditional linear models, can learn nonlinear functions and high-order features, which enable exceptional performance. In the field of computational biology, the rapid growth of data scale and complexity increases the demand for building powerful deep learning based tools. Despite the success of using the deep learning methods, understanding of the reasons for the effectiveness and interpretation of models remain elusive.
This dissertation aims to provide several different approaches to extract information from deep models. This information could be used to address the problems of model complexity and model interpretability.
The amount of data needed to train a model depends on the complexity of the model. The cost of generating data in biology is typically large. Hence, collecting the data on the scale comparable to other deep learning application areas, such as computer vision and speech understanding, is prohibitively expensive and datasets are, consequently, small. Training models of high complexity on small datasets can result in overfitting -- model tries to over-explain the observed data and has a bad prediction accuracy on unobserved data. The number of parameters in the model is often regarded as the complexity of the model. However, deep learning models usually have thousands to millions of parameters, and they are still capable of yielding meaningful results and avoiding over-fitting even on modest datasets. To explain this phenomenon, I proposed a method to estimate the degrees of freedom -- a proper estimate of the complexity -- in deep learning models. My results show that the actual complexity of a deep learning model is much smaller than its number of parameters. Using this measure of complexity, I propose a new model selection score which obviates the need for cross-validation.
Another concern for deep learning models is the ability to extract comprehensible knowledge from the model. In linear models, a coefficient corresponding to an input variable represents that variable’s influence on the prediction. However, in a deep neural network, the relationship between input and output is much more complex. In biological and medical applications, lack of interpretability prevents deep neural networks from being a source of new scientific knowledge. To address this problem, I provide 1) a framework to select hypotheses about perturbations that lead to the largest phenotypic change, and 2) a novel auto-encoder with guided-training that selects a representation of a biological system informative of a target phenotype. Computational biology application case studies were provided to illustrate the success of both methods.
2017-12
2017
Computer science
deep learning; hypotheses generation; informative representation; interpretable learning; machine learning; model complexity
eng
Doctor of Philosophy
Dissertation
University of North Carolina at Chapel Hill Graduate School
Degree granting institution
Vladimir
Jojic
Thesis advisor
Jeffery L.
Dangl
Thesis advisor
Leonard
McMillan
Thesis advisor
Marc
Niethammer
Thesis advisor
Mohit
Bansal
Thesis advisor
text
Gao_unc_0153D_17407.pdf
uuid:800e9020-fb59-4040-a6e2-2a1b7be5e3ef
2019-12-31T00:00:00
2017-12-11T03:52:20Z
proquest
application/pdf
2264167