Genetic Association Analysis on Secondary Phenotypes and Group Conditional Variable Importance in OPPERA Study Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
  • Xue, Wei
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • Temporomandibular disorder (TMD) is a complex chronic painful orofacial disorder resulting in dysfunction in the temporomandibular joints and the muscles around the jaw. Numerous risk factors were studied and identified for the chronic and onset of TMD. The Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study is a prospective study designed to study the etiology and the risk factors contributing to the onset and chronic TMD (Smith et al. 2011). Genetic risk factors play an important role in the etiology of TMD. While many studies identified the genetic variants associated with TMD case-control status, one may wish to identify genetic markers associated with secondary phenotypes (such as clinical pain) that are related to the severity of TMD. In such cases, naive regression methods that ignore the case-control design produces biased results. This problem may be corrected by statistical methods such as inverse probability weighting (IPW). However, it may be unreliable when genetic markers and secondary phenotypes are strongly associated with case-control status. In order to perform unbiased association analysis, we proposed a novel permutation-based IPW method, and compared it with conventional IPW method. The results indicated that the permutation-based IPW produced controlled type I error rates with no loss in power. The application to the data from OPPERA study identified the associated SNPs with the severity of orofacial pain. Numerous risk factors were studied in previous OPPERA studies to cast light on the etiology of TMD. It is of great interest to researchers to know if a subset of variables have high variable important (VIMP) score conditional on existing risk factors. The are curious to know that in addition to the existing variables, would the group variables bring more information in predicting outcome. For example, they want to know if the group importance score for measuring mechanical and thermal pain sensitivity is significantly different from 0 conditional on all the other variables when predicting either chronic or first-onset TMD. In the second project, we proposed a method to test the group conditional variable importance statistically, by conditional distribution of group variables on the those outside the group using random forest model. Simulations were performed by continuous and categorical variable types. p-values were calculated for some groups. This method corrects the shortcomings of the likelihood of choosing correlated variables with spurious correlation, and provided a way of testing group variables without bias. The methodology described in the second topic was applied to data in OPPERA case-control and cohort study in the third topic for both chronic and first-onset TMD. Correlated risk factors were subset and tested by the proposed method for the null hypothesis of group VIMP score not significantly different from 0 based on the rest risk factors of TMD in the data set. A number of groups of variables were identified bringing more information for the chronic TMD in addition to the existing risk factors. But none of the proposed groups were identified conditionally important in first-onset TMD study.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Shad, Smith
  • Bair, Eric
  • Li, Yun
  • Chen, Mengjie
  • Zeng, Donglin
  • Doctor of Public Health
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2016

This work has no parents.