Statistical Methods for Mass Spectrometry Proteomics

O'Brien, Jonathon

Download PDF

Request Version for Screen Reader

Last Modified

March 20, 2019

Creator

O'Brien, Jonathon
- Affiliation: Gillings School of Global Public Health, Department of Biostatistics

Abstract

DNA makes RNA makes proteins is the central dogma of molecular biology. While the measurement of RNA has dominated the landscape of scientificc inquiry for many years, often the true outcome of interest is the final protein product. Microarray and RNAseq studies do not tell researchers anything about what happens during and after translation. For this reason interest in directly measuring the proteome has flourished. Unfortunately the direct analysis of proteins often creates a complicated inferential situation. When scientists want to see the whole proteome (or at least a large unknown sample of the proteome) mass spectrometry is often the most powerful technology available. Mass spectrometers allow researchers to separate proteins from complex samples and obtain information about the relative abundance of around 10,000 proteins in a given experiment. However the analysis of mass spectrometry proteomics data involves a complicated statistical inference problem. Inference is made on relative protein abundance by examining protein fragments called peptides. This inference problem is complicated by the two intrinsic statistical didifficulties of proteomics; matched pairs and non-ignorable missingness, which combine to create unexpected challenges for statisticians. Here I will discuss the complexities of modeling mass spectrometry proteomics and provide new methods to improve both the accuracy and depth of protein estimation. Beyond point estimation, great interest has developed in the proteomics community regarding the clustering of high throughput data. Although the strange nature of proteomics data likely causes unique problems for clustering algorithms, we found that work needed to be done regarding the statistical interpretation of clustering before any special cases could be considered. For this reason we have explored clustering from a statistical framework and used this foundation to establish new measures of clustering performance. These indices allow for the interpretation of a clustering problem in the commonly understood framework of sensitivity and specificity.

Date of publication

May 2016

Keyword

DOI

https://doi.org/10.17615/bd50-t323

Resource type

Dissertation

Rights statement

In Copyright

Advisor

Ibrahim, Joseph
Thomas, Nancy
Qaqish, Bahjat
Sun, Wei
Chen, Mengjie

Degree

Doctor of Philosophy

Degree granting institution

University of North Carolina at Chapel Hill Graduate School

Graduation year

2016

Language

English

Relations

Parents:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	OBrien_unc_0153D_15967.pdf	2019-04-07	Public	Download

Statistical Methods for Mass Spectrometry Proteomics

Downloadable Content

Relations

Items