Model checking and prediction with censored data Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
  • Chen, Li
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • The class of semiparametric transformation models provides a very general framework for studying the effects of (possibly time-dependent) covariates on survival time and recurrent event times. Although many theoretical and methodological advances have been made for transformation models, the methods for assessing the adequacy of these models have not been formally studied. In the first part of this dissertation, we introduce appropriate residuals for these models and consider the cumulative sums of the residuals. Under the assumed model, the cumulative-sum processes converge weakly to zero-mean Gaussian processes whose distributions can be approximated through Monte Carlo simulation. These results enable one to assess, both visually and numerically, how unusual the observed residual patterns are in reference to their null distributions. The residual patterns can also be used to determine the nature of model misspecification. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. A colon cancer study is provided for illustration. Attributable fractions are commonly used to measure the impact of risk factors on disease incidence in the population. These static measures can be extended to functions of time when the time to disease occurrence or event time is of interest. In the second part of this dissertation, we deal with nonparametric and semiparametric estimation of attributable fraction functions for cohort studies with potentially censored event time data.The semiparametric models include the familiar proportional hazards model and a broad class of transformation models. The proposed estimators are shown to be consistent, asymptotically normal and asymptotically efficient. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. A cardiovascular health study is provided. There is a tremendous current interest in using multiple clinical and/or genetic factors to predict progression of disease. To determine which set of factors is most predictive, the predictive accuracy of multiple factors must be quantified. The existing measures are focused on the proportion of variation explained by the factors. These measures are not easily interpreted and have rarely been used in clinical practice. In the third part of this dissertation, we develop measures of predictive accuracy based on the survival curves associated with different sets of predictors. Such measures extend positive and negative predictive values to time-to-event outcomes and multiple factors and have direct clinical relevance. We develop estimators for these measures under flexible censoring mechanisms. The proposed estimators are shown to be consistent and asymptotically normal. Simple Monte Carlo methods are developed to approximate the asymptotic distributions. Simulation studies show that the proposed methods perform well in practical situations. The Mayo primary biliary cirrhosis (PBC) study is provided for illustration.
Date of publication
Resource type
Rights statement
  • In Copyright
  • "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biostatistics."
  • Zeng, Donglin
  • Lin, Danyu
Place of publication
  • Chapel Hill, NC
  • Open access

This work has no parents.