Statistical tools for general association testing and control of false discoveries in group testing Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 19, 2019
  • Rudra, Pratyaydipta
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • In modern applications of high-throughput technologies, it is important to identify pairwise associations between variables, and desirable to use methods that are powerful and sensitive to a variety of association relationships. In the first part of the dissertation, we describe RankCover, a new non-parametric association test for association between two variables that measures the concentration of paired ranked points. Here `concentration' is quantified using a disk-covering statistic that is similar to those employed in spatial data analysis. Analysis of simulated datasets demonstrates that the method is robust and often powerful in comparison to competing general association tests. We also illustrate RankCover in the analysis of several real datasets. Using RankCover, we also propose a method of testing the association of two variables while controlling the effect of a third variable. In the second part of the dissertation, we describe statistical methodologies for testing hypotheses that can be collected into groups, with each group showing potentially different characteristics. Methods to control family-wise error rate or false discovery rate for group testing have been proposed earlier, but may not easily apply to expression quantitative trait loci (eQTL) data, for which certain structured alternatives may be defensible and enable the researcher to avoid overly conservative approaches. In an empirical Bayesian setting, we propose a new method to control the false discovery rate (FDR) for grouped hypothesis data. Here, each gene forms a group, with SNPs annotated to the gene corresponding to individual hypotheses. Heterogeneity of effect sizes in different groups is considered by the introduction of a random effects component. Our method, entitled Random Effects model and testing procedure for Group-level FDR control (REG-FDR) assumes a model for alternative hypotheses for the eQTL data and controls the FDR by adaptive thresholding. Finally, we propose Z-REG-FDR, an approximate version of REG-FDR that uses only Z-statistics of association between genotype and expression at each SNP. Simulations demonstrate that Z-REG-FDR performed similarly to REG-FDR, but with much improved computational speed. We further propose an extension of Z-REG-FDR to a multi-tissue setting, providing the basis for gene-based multi-tissue analysis.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Wright, Fred A.
  • Mohlke, Karen
  • Nobel, Andrew
  • Li, Yun
  • Sun, Wei
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2015
Place of publication
  • Chapel Hill, NC
  • There are no restrictions to this item.

This work has no parents.