Collections > Electronic Theses and Dissertations > Analysis of Time-to-event Data, Intermediate Phenotypes, and Sparse Factors in the OPPERA Study

Motivated by the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) project, a large study of temporomandibular disorders (TMD), this dissertation develops statistical methods applicable to three facets of chronic pain. First, we propose a method for parameter estimation in survival models with missing censoring indicators. These result because conducting multiple invasive examinations for incidence on all participants in large prospective studies is infeasible. We estimate the probability of being an incident case for those lacking a gold standard examination using logistic regression. Multiple imputations of case status for each missing examination are generated using these estimated probabilities. Imputed and observed data are combined in Cox models to estimate the incidence rate and associations with putative risk factors. The variance is estimated using multiple imputation. Our method performs as well as or better than competing methods and highlighted new discoveries for OPPERA. Secondly, we propose a general method to analyze secondary phenotypes and apply it to the OPPERA baseline case-control study. Traditional case-control genetic association studies examine relationships between case-control status and one or more covariates. Investigators now commonly study additional phenotypes and their association with the original covariates as secondary aims. Assessing these associations is statistically challenging, as participants do not form a random sample from the population of interest. Standard methods may be biased and lack coverage and power. Utilizing inverse probability weighting and bootstrapping for standard error estimation, our method performs as well as competitors when they are applicable and provides promising results for outcomes to which other methods do not apply. Third, we propose a method for sparse factor analysis. Psychometric studies frequently measure numerous variables that may be noisy manifestations of a few underlying constructs. Aims include identifying these latent variables and their relationship to the observed variables and reducing the data to a few key variables that explain the majority of variance. While variable reduction methods exist for principal component analysis, none have been proposed to date for factor analysis. Our method retains predictive accuracy for many thresholds in simulations while providing sparse loadings. Competing methods had less predictive accuracy or less sparsity.