Novel statistical methods for the study design and analysis of genome-wide association studies Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 22, 2019
  • Ho, Lindsey Allen
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • In Chapter 2, we compare the power of association studies using cases and screened controls to studies that incorporate free public control genotype data. We describe a two-stage replication-based design, which uses free public control genome-wide genotype data in the first stage, and follow-up genotype data on study controls in the second stage. We assess the impact of systematic ancestry differences and batch genotype effects. We show that the proposed two-stage replication-based design can dramatically increase statistical power and decrease cost of large-scale genetic association studies. In Chapter 3, we describe and compare conventional haplotype analysis approaches to a number of haplotype sharing measures. We evaluate the impact of the inclusion of markers in linkage disequilibrium (LD) on power and assess the utility of recoding scores using thresholds. Finally, we develop a quick and novel approach based on categorizing similar haplotypes into contingency tables. These alternative methods are compared via simulation assuming a rare-recessive disorder caused by a small number of high-penetrant mutations within a single disease locus. We found that incorporating allele frequencies and dichotomizing scores increased power. Conversely, using fixed windows and excluding single nucleotide polymorphisms (SNPs) in low LD or with low minor allele frequencies decreased power. Finally we show that our novel clustering algorithm had competitive power than permutation testing. In Chapter 4, we describe an alternative method to single SNP analyses of single or multiple candidate genes that is designed to increase power when multiple SNPs are associated with the trait. Our method is based on forward selection in regression that provides a joint test of the statistical significance of a gene. Within the framework of a simulated candidate gene study as well as a study of related candidate genes, we assess the power of this method by simulating a quantitative trait and compare our proposed method to single SNP and other multiple SNP models. Our results suggest that our method is competitive to conventional methods and may be more powerful when SNP x SNP interactions exist.
Date of publication
Resource type
Rights statement
  • In Copyright
  • "... in partial fulfillment of the requirements for the degree of Doctor of Public Health in the Department of Biostatistics, Gillings School of Global Public Health."
  • Lange, Ethan
Degree granting institution
  • University of North Carolina at Chapel Hill
Place of publication
  • Chapel Hill, NC
  • Open access

This work has no parents.