Collections > Electronic Theses and Dissertations > Marginalized two-part models for semicontinuous data with application to medical costs

In health services research, it is common to encounter semicontinuous data characterized by a point mass at zero followed by a right-skewed continuous distribution with positive support. Examples include health expenditures, in which the zeros represent a subpopulation of patients who do not use health services, while the continuous distribution describes the level of expenditures among health services users. Semicontinuous data are typically analyzed using two-part mixture models that separately model the probability of health services use and the distribution of positive expenditures among users. However, because the second part conditions on a nonzero response, conventional two-part models do not provide a marginal interpretation of covariate effects on the overall population of health service users and non-users, even though this is often of greatest interest to investigators. Here, we propose a marginalized two-part model that yields more interpretable effect estimates in two-part models by parameterizing the model in terms of the marginal mean. This model maintains many of the important features of conventional two-part models, such as capturing zero-inflation and skewness, but allows investigators to examine covariate effects on the overall marginal mean, a target of primary interest in many applications. Using a simulation study, we examine properties of the maximum likelihood estimators from this model. We illustrate the approach by evaluating the effect of a behavioral weight loss intervention on health care expenditures in the Veterans Affairs (VA) health care system. We then extend this marginalized two-part model to clustered or longitudinal data structures by incorporating random effects. This longitudinal marginalized two-part model is fit following a fully Bayesian approach with non-informative or weakly informative prior distributions, and we illustrate it by analyzing the effect of a copayment increase in the VA health system. Finally, using simulation studies, we compare the performance of the marginalized two-part model to commonly used one-part generalized linear models (GLMs) fit via quasi-likelihood estimation over a range of simulated data scenarios with varying percentages of zero-valued observations.