Statistical methods for analysis of genetic data Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 21, 2019
  • Cabanski, Christopher R.
    • Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research
  • Genetic studies of gene expression typically aim to identify a set of genes that are associated with a disease, such as a specific cancer type. A single microarray or next generation sequencing experiment can simultaneously measure gene expression for tens of thousands of genes. When analyzing high-dimensional gene expression data, clusters in the data often represent biological quantities of interest, such as tumor subtypes. In this dissertation, we describe Standardized WithIn Class Sum of Squares (SWISS), a statistical tool that quantifies how well a high-dimensional data set clusters into predefined classes. We show SWISS to be very useful in genetic studies for comparing two different processing methods on the same data set by indicating which processing method yields better relative separation between classes. Additionally, we investigate the asymptotic behavior of SWISS in the High Dimension Low Sample Size setting, where the sample size is fixed and the dimension grows. Next generation sequencing is rapidly becoming the technology of choice for genomic studies. This technology allows millions of fragments of DNA to be simultaneously sequenced. Unfortunately, this technology is not error-free and occasionally will call an incorrect base. When a base is sequenced, a quality score is also provided which corresponds to the probability that the base called is incorrect. In the second half of this dissertation, we show that these quality scores do not accurately represent the probability of a sequencing error. We describe a method that recalibrates these quality scores and show that these recalibrated scores are more accurate and better at discriminating sequencing errors from non-errors.
Date of publication
Resource type
Rights statement
  • In Copyright
  • ... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Statistics and Operations Research.
  • Marron, James Stephen
Degree granting institution
  • University of North Carolina at Chapel Hill

This work has no parents.