Collections > Electronic Theses and Dissertations > Analysis of Interval Censored Data Using a Longitudinal Biomarker
pdf

In many medical studies, interest focuses on studying the effects of potential risk factors on some disease events, where the occurrence time of disease events may be defined in terms of the behavior of a biomarker. For example, in diabetic studies, diabetes is defined in terms of fasting plasma glucose being 126 mg/dl or higher. In practice, several issues complicate determining the exact time-to-disease occurrence. First, due to discrete study follow-up times, the exact time when a biomarker crosses a given threshold is unobservable, yielding so-called interval censored events. Second, most biomarker values are subject to measurement error due to imperfect technologies, so the observed biomarker values may not reflect the actual underlying biomarker levels. Third, using a common threshold for defining a disease event may not be appropriate due to patient heterogeneity. Finally, informative diagnosis and subsequent treatment outside of observational studies may alter observations after the diagnosis. It is well known that the complete case analysis excluding the externally diagnosed subjects can be biased when diagnosis does not occur completely at random. To resolve these four issues, we consider a semiparametric model for analyzing threshold-dependent time-to-event defined by extreme-value-distributed biomarkers. First, we propose a semiparametric marginal model based on a generalized extreme value distribution. By assuming the latent error-free biomarkers to be non-decreasing, the proposed model implies a class of proportional hazards models for the time-to-event defined for any given threshold value. Second, we extend the marginal likelihood to a pseudo-likelihood by multiplying the likelihoods over all observation times. Finally, to adjust for externally diagnosed cases, we consider a weighted pseudo-likelihood estimator by incorporating inverse probability weights into the pseudo-likelihood by assuming that external diagnosis depends on observed data rather than unobserved data. We estimate the three model parameters using the nonparametric EM, pseudo-EM and weighted-pseudo-EM algorithm, respectively. Herein, we theoretically investigate the models and estimation methods. We provide a series of simulations, to test each model and estimation method, comparing them against alternatives. Consistency, convergence rates, and asymptotic distributions of estimators are investigated using empirical process techniques. To show a practical implementation, we use each model to investigate data from the ARIC study and the diabetes ancillary study of the ARIC study.