Collections > Electronic Theses and Dissertations > Model Assessment for Models with Missing Data
pdf

Missing data commonly occur in various study setting. In this dissertation, we first investigate three likelihood-based models for missing data in longitudinal studies: mixed effects models, pattern mixture models (PMM), and selection models. Extensive simulations from ten missing mechanisms are performed with the focus on treatment effect. Results suggest that no model consistently performs better than others under various missing data mechanism. However, PMM using the treatment-specific proportion and selection model provide some correction of the estimate compared with mixed-effects model in several missing not at random situations, even when the mechanism of missing data is not exactly the same as the model assumption. Secondly, we focus on the case deletion diagnostic measures for general linear models (GLMs) with missing covariate data. Cook's distance is one of the most important diagnostic tools to identify influential observations on the parametric models. However, Cook's distance may not be directly comparable because its scale stochastically depends on the degree of the perturbation. We define the degree of perturbation for GLM with missing covariates. Then, we derive the Cook's distance based on likelihood function and compare it to the Cook's distance based on the Q-function used in the EM algorithm for models with missing data. We further develop the scaled Cook's distance in the GLM with missing covariate data, which resolves the size issue of Cook's distance. Simulation data are used to illustrate the size matters issue in GLM with missing covariates. The applications of scaled Cook's distances in a formal influence analysis are examined in simulations and real data examples. At last, we examine the connection between case deletion measures and cross validation method for GLM with missing covariates models. Based on such connection, we develop case-deletion model complexity (CMC) measures for quantifying the model complexity and case-deletion information criteria (CIC) for model selection. We develop these new measures and criteria based on the likelihood function and the Q-function, respectively. Some properties of CMC and CIC are investigated. Simulations and real data analysis show that CIC is a valuable tool for analysis of models with missing data.