Multiple testing in genome-wide studies

Kang, Moonsu

Download PDF

Request Version for Screen Reader

Last Modified

March 21, 2019

Creator

Kang, Moonsu
- Affiliation: Gillings School of Global Public Health, Department of Biostatistics

Abstract

DNA microarray technologies allow us to monitor expression levels of thousands of genes simultaneously. A basic task in analyzing microarray data is the identification of differentially expressed genes under different experimental conditions. The null hypothsis is no association between the expression levels and explanatory variables or covariates. Family-wise error rate (FWER), although very conservative, controls type I error. False Discovery Rate (FDR) is a less stringent approach which aims to control the expected proportion of Type I errors among the rejected hypotheses. Since there are thousands of genes tested simultaneously, FDR may be enhanced. High correlation between tested genes, attributed to co-regulations and dependency in the measurement errors, further complicates the problem. Most of the current FDR procedures assume independence or rather restrictive dependence structures, resulting in being less reliable. In this work, we address these very large multiplicity problems by adopting a two-stage FDR controlling procedure under suitable dependence structures and based on Poisson distributional approximation, which eliminates the need to assume restricted dependence structures. We compare the performance of the proposed FDR procedure with that of other FDR controlling procedures, with illustration of the leukemia microarray study of Golub et al. (1999) and simulated data. In these studies, the proposed FDR procedure has greater power without much elevation of FDR. Current FDR procedures have not been used extensively in genomic sequences involving count or discrete, or purely qualitative responses, confronted with high-dimensional low sample size constraints. Using the 2002-03 SARS epidemic model, it is shown that proposed FDR procedure along with an appropriate test statistic based on a pseudo-marginal approach with Hamming distance performs better. Finally, for classfication of genes of dependent genes with heterogeneity amidst a small sample, standard robust inference may not work out. This issue involves setting up a hypothesis when parameters of interest are subject to inequality restrictions. Usual (restricted) likelihood based statistical inference procedures may not be computationally intensive. Roy's union-intersection principle may be a viable alternative. The breast cancer study of Lobenhofer et al. is included for numerical illustration.

Date of publication

December 2007

DOI

https://doi.org/10.17615/v640-8v04

Resource type

Dissertation

Rights statement

In Copyright

Advisor

Sen, Pranab Kumar

Degree granting institution

University of North Carolina at Chapel Hill

Language

English

Access right

Open access

Date uploaded

October 19, 2010

Relations

Parents:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	Multiple testing in genome-wide studies	2019-04-10	Public	Download

Multiple testing in genome-wide studies

Downloadable Content

Relations

Items