Statistical Methods for Mass Spectrometry Proteomics Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
  • O'Brien, Jonathon
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • DNA makes RNA makes proteins is the central dogma of molecular biology. While the measurement of RNA has dominated the landscape of scientificc inquiry for many years, often the true outcome of interest is the final protein product. Microarray and RNAseq studies do not tell researchers anything about what happens during and after translation. For this reason interest in directly measuring the proteome has flourished. Unfortunately the direct analysis of proteins often creates a complicated inferential situation. When scientists want to see the whole proteome (or at least a large unknown sample of the proteome) mass spectrometry is often the most powerful technology available. Mass spectrometers allow researchers to separate proteins from complex samples and obtain information about the relative abundance of around 10,000 proteins in a given experiment. However the analysis of mass spectrometry proteomics data involves a complicated statistical inference problem. Inference is made on relative protein abundance by examining protein fragments called peptides. This inference problem is complicated by the two intrinsic statistical didifficulties of proteomics; matched pairs and non-ignorable missingness, which combine to create unexpected challenges for statisticians. Here I will discuss the complexities of modeling mass spectrometry proteomics and provide new methods to improve both the accuracy and depth of protein estimation. Beyond point estimation, great interest has developed in the proteomics community regarding the clustering of high throughput data. Although the strange nature of proteomics data likely causes unique problems for clustering algorithms, we found that work needed to be done regarding the statistical interpretation of clustering before any special cases could be considered. For this reason we have explored clustering from a statistical framework and used this foundation to establish new measures of clustering performance. These indices allow for the interpretation of a clustering problem in the commonly understood framework of sensitivity and specificity.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Ibrahim, Joseph
  • Thomas, Nancy
  • Qaqish, Bahjat
  • Sun, Wei
  • Chen, Mengjie
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2016

This work has no parents.