Collections > Electronic Theses and Dissertations > Association Analysis of Rare Variants in Sequencing Studies
pdf

Recent advances in sequencing technologies have made it possible to explore the influence of rare variants on complex diseases and traits. Large-scale sequencing studies provide the opportunity to examine the proportion of the missing heritability that is attributable to rare variants. They also pose a range of analytical and computational challenges that cannot be adequately addressed with existing methods. For the association analysis of the rare variants, it is customary to aggregate rare mutations within a gene to perform gene-level association analysis. In the first part of the dissertation, we develop asymptotic and resampling gene-level association tests for a variety of traits and study designs. We employ score statistics under appropriate statistical models to achieve numerical stability and computational efficiency. The resulting software SCORE-Seq features a large collection of utilities devoted to perform gene-level association analysis in different scenarios. Trait-dependent sampling has been adopted in many sequencing projects to reduce cost. In the second part, we provide a valid and efficient maximum likelihood framework for analyzing binary secondary traits under such sampling strategy. We produce the commonly used gene-level association tests and compare our methods with the naive methods ignoring the trait-dependent sampling. A single sequencing study is often underpowered to detect modest genetic effect of rare variants. Several methods are available to conduct meta-analysis for rare variants under fixed-effects models, which assume that the genetic effects are the same across all studies. In practice, genetic associations are likely to be heterogeneous among studies because of differences in population composition, environmental factors, phenotype and genotype measurements, or analysis method. In the third part, we propose a general framework for meta-analysis of sequencing studies that allows the genetic effects to vary among studies. We produce the fixed-effects and random-effects versions of all commonly used gene-level association tests. Our methods take score statistics, rather than individual participant data, as input and thus can accommodate any study designs and any phenotypes. We demonstrate through extensive simulation studies that our tests are more powerful than the existing ones in a wide range of practical situations.