Statistical aspects of haplotype-based association studies Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 21, 2019
  • Huang, Bevan Emma
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • A decade ago, genomewide association studies were proposed as a tool to unravel the genetic basis of complex diseases. It is only now that they are becoming practical realities due to improved technology and reduced genotyping costs. For such studies, the issues of power and efficiency are crucial due to the quantity of markers genotyped and the moderate effect sizes involved. Haplotype-based analysis incorporates information from multiple markers, and so is potentially more powerful than single-SNP analysis. Unfortunately, not only is it computationally more intensive, but since haplotypes are not directly observed, there exists a major analytical challenge with haplotype association analysis. Several methods are available to infer individual haplotypes from unphased genotype data, but using the inferred haplotypes in the ensuing association analysis can result in biased estimates and reduced power. We investigate the situations for which the disadvantages of the imputation process may outweigh its convenience. In addition, we describe alternatives to imputation which result in efficient haplotype association analysis. For case-control studies, we develop methods for use in genomewide studies which account for the correlation between SNPs in multiple test correction. Simulation studies based on the HapMap data showed that the proposed method performs well in realistic situations. We applied it to a case-control dataset of 2,300 SNPs to test for association with rheumatoid arthritis. For quantitative trait loci, we focus on gains in power which may be made via selective genotyping designs, where only those individuals with extreme phenotypes are genotyped. Because selection depends on the phenotype, the resulting data cannot be properly analyzed by standard statistical methods. We provide appropriate likelihoods for assessing the effects of genotypes and haplotypes on quantitative traits under such designs. We demonstrate that the likelihood-based methods are highly effective in identifying causal variants, and are substantially more powerful than existing methods. We initially consider two practical designs, then extend the methods to a two-phase sampling design. Additionally, we provide methods to test for haplotype-disease association in the presence of covariates. Simulations demonstrate the effectiveness of these likelihood-based methods.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Lin, Danyu
Degree granting institution
  • University of North Carolina at Chapel Hill
  • Open access

This work has no parents.