Multiple testing in genome-wide studies Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 21, 2019
  • Kang, Moonsu
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • DNA microarray technologies allow us to monitor expression levels of thousands of genes simultaneously. A basic task in analyzing microarray data is the identification of differentially expressed genes under different experimental conditions. The null hypothsis is no association between the expression levels and explanatory variables or covariates. Family-wise error rate (FWER), although very conservative, controls type I error. False Discovery Rate (FDR) is a less stringent approach which aims to control the expected proportion of Type I errors among the rejected hypotheses. Since there are thousands of genes tested simultaneously, FDR may be enhanced. High correlation between tested genes, attributed to co-regulations and dependency in the measurement errors, further complicates the problem. Most of the current FDR procedures assume independence or rather restrictive dependence structures, resulting in being less reliable. In this work, we address these very large multiplicity problems by adopting a two-stage FDR controlling procedure under suitable dependence structures and based on Poisson distributional approximation, which eliminates the need to assume restricted dependence structures. We compare the performance of the proposed FDR procedure with that of other FDR controlling procedures, with illustration of the leukemia microarray study of Golub et al. (1999) and simulated data. In these studies, the proposed FDR procedure has greater power without much elevation of FDR. Current FDR procedures have not been used extensively in genomic sequences involving count or discrete, or purely qualitative responses, confronted with high-dimensional low sample size constraints. Using the 2002-03 SARS epidemic model, it is shown that proposed FDR procedure along with an appropriate test statistic based on a pseudo-marginal approach with Hamming distance performs better. Finally, for classfication of genes of dependent genes with heterogeneity amidst a small sample, standard robust inference may not work out. This issue involves setting up a hypothesis when parameters of interest are subject to inequality restrictions. Usual (restricted) likelihood based statistical inference procedures may not be computationally intensive. Roy's union-intersection principle may be a viable alternative. The breast cancer study of Lobenhofer et al. is included for numerical illustration.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Sen, Pranab Kumar
Degree granting institution
  • University of North Carolina at Chapel Hill
  • Open access

This work has no parents.