Fixed effects inference for clustered data in Gaussian linear models Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 21, 2019
  • Johnson, Jacqueline Laurel
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • Important public health research often requires the use of community based studies due to logistical, ethical and cost constraints. Such designs require special methods of analysis. Gaussian clustered data are often analyzed with either a mixed effects linear model on individual level data or two-stage analysis of cluster means. For data with a large number of clusters and large number of observations within each cluster, both techniques provide unbiased hypothesis tests. In small samples with unbalanced data, however, even moderate imbalance in cluster size across treatment groups can bias hypothesis tests in the two stage analysis of cluster means. The use of large sample approximations for one-stage mixed model test statistics for analysis of small, unbalanced clustered experiments may also lead to inaccurate hypothesis tests. I derived a formulation of quadratic form theory which leads to a method to obtain exact test size for hypothesis tests in the two stage model. This theory is used in an enumeration study of type I error for a test of treatment difference in the two stage analysis of cluster means where means are either unweighted or weighted by their cluster size. These enumerations focus on scenarios of imbalance common to non-randomized cluster data settings. Next I performed a simulation study of type I error for a test of treatment difference in both the analysis of individual level data and of cluster means for scenarios of imbalance common to randomized clustered data trials. Ten methods were considered; of these, a two stage analysis of cluster means with means weighted by their theoretical variance controlled type I error under the most cases. In this analysis, the weights contain restricted maximum likelihood estimates of variance components estimated from the individual level data and are constrained to be positive. Many current clustered data studies currently show a misalignment between power calculations and data analysis; that is, the power analysis is done for a simplified version of the actual test computed. I showed how to perform an appropriate and valid power analysis for the previous two stage method and applied this to a study on adolescent drinking behavior.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Catellier, Diane J.
  • Open access

This work has no parents.