Bayesian Methods for Highly Correlated Exposures: an Application to Tap Water Disinfection By-Products and Spontaneous Abortion Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
  • MacLehose, Richard
    • Affiliation: Gillings School of Global Public Health, Department of Epidemiology
  • Highly correlated exposures are common in epidemiology. However, standard maximum likelihood techniques frequently fail to provide reliable estimates in the presence of highly correlated exposures. As a result, hierarchical regression methods are increasingly being used. Hierarchical regression places a prior distribution on the exposure-specific regression coe±cients in order to stabilize estimates and incorporate prior knowledge. We examine three types of hierarchical models: semi-Bayes, fully-Bayes, and Dirichlet Process Priors. In the semi-Bayes approach, the prior mean and variance are treated as fixed constants chosen by the epidemiologist. An alternative is the fully-Bayes approach that places hyperprior distributions on the mean and variance of the prior distribution to allow the data to inform about their values. Both of these approaches rely on a parametric specification for the exposure-speciffic coe±cients. As a more flexible nonparametric option, one can use a Dirichlet process prior which also serves to cluster exposures into groups, effectively reducing dimensionality. We examine the properties of these three models and compare their mean squared error in simulated datasets. We use these hierarchical models to examine the relationship between disinfection by-products and spontaneous abortion. Spontaneous abortion is a common pregnancy outcome, although relatively little is known about its causes. Previous research has generally indicated an increased risk of spontaneous abortion among those who consume higher amounts of disinfection by-products. Right from the Start is a large multi-center cohort study of women who were followed through early pregnancy. Disinfection by-product concentrations were measured each week during the study, allowing for more precise exposure measurement than previous epidemiologic studies. We focus our attention on the concentrations of 13 constituent disinfection by-products (4 trihalomethanes and 9 haloacetic acids), some of which are so highly correlated that conventional maximum likelihood estimates are unreliable. To allow simultaneous estimation of effects, we implement 4 Bayesian hierarchical models: semi-Bayes, fully-Bayes, Dirichlet process prior (DPP1) and Dirichlet process prior with a selection component (DPP2). Models that allowed prior parameters to be updated from the data tended to give far more precise coe±cients and be more robust to prior specification. The DPP1 and DPP2 models were in close agreement in estimating no effect of any constituent disinfection by-products on spontaneous. The fully-Bayes model largely agreed with the DPP1 and DPP2 models but had less precision, while the semi-Bayes model provided the least precise estimates.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Kaufman, Jay S.
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2006

This work has no parents.