Quantitative methods for evaluating association between multiple rare genetic variants and complex human traits

Byrnes, Andrea Elizabeth

Download PDF

Request Version for Screen Reader

Last Modified

March 19, 2019

Creator

Byrnes, Andrea Elizabeth
- Affiliation: Gillings School of Global Public Health, Department of Biostatistics

Abstract

First, we propose two methods for aggregation of rare variants in data from Genome-wide Association Studies (GWAS), a weighted haplotype-based approach and an imputation-based approach, to test for the effect of rare variants with GWAS data. Both methods can incorporate external sequencing data when available. Our methods clearly show enhanced statistical power over existing methods for a wide range of population-attributable risk, percentage of disease-contributing rare variants, and proportion of rare alleles working in different directions. We thus demonstrate that the evaluation of rare variants with GWAS data is possible, particularly when public sequencing data are incorporated. Second, we present a systematic evaluation of multiple weighting schemes through a series of simulations intended to mimic large sequencing studies of a quantitative trait. We evaluate existing phenotype-independent and phenotype-dependent methods, as well as weights estimated by penalized regression. We find that the difference in power between phenotype-dependent schemes is negligible when high-quality functional annotations are available. When functional annotations are unavailable or incomplete, all methods lose power; however, the variable selection methods outperform the others at a cost of increased computational time. In the absence of highly accurate annotation, we recommend variable selection methods (which can be viewed as statistical annotation) on top of regions implicated by a phenotype-independent weighting scheme. Finally, we propose a method to apply the Sequence Kernel Association Test (SKAT), a similarity-based approach for rare variant association, to data from admixed populations by first estimating local ancestry for each variant. In simulations, we find that when the true causal alleles are causal only from only one ancestral population, our proposed approaches show a marked improvement in power over the original SKAT method. In real data, our results support the previously reported European-specific association and illustrate the increased statistical power of the proposed methods to find such associations.

Date of publication