A Monte Carlo Study of Several Different Approaches to the Behrens–Fisher Problem Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
  • Melendez, Carlos
    • Affiliation: School of Education
  • One of the tests most often used to compare the means difference of two independent groups is the pooled t-test (i.e., Student’s t-test or classical t-test). However, the validity of pooled t-test results is based on certain assumptions, including the homogeneity of variance (HOV). The violation of HOV has been called in the statistical literature the Behrens–Fisher problem. The purpose of this dissertation was to compare and contrast the Type I error-rate performance and statistical power of five solutions to the Behrens–Fisher problem under several different simulated conditions. The methods studied were the pooled t-test, the Cochran–Cox t-test, the t-test with the Welch–Satterthwaite correction, and two different bootstrap methods: a non-parametric bootstrapping method using the Efron and Tibshirani (1993) approach, and a non-parametric bootstrapping method using a modified version of the Good (2005) approach. These methods were compared and contrasted in terms of Type I error-rate performance and statistical power for several different conditions. In this study, computer simulated data from a normal population with mean 0 and variance 1 were used to contrast the achieved significance level (p-value) and power of the five method mentioned for testing the mean difference of two groups when the HOV could not be assumed. Only three methods consistently yielded accurate p-values for a two-tailed hypothesis test: the Cochran–Cox t-test, the nonparametric bootstrap method using the Efron–Tibshirani approach, and the Welch–Satterthwaite approximate t-test. These methods effectively controlled for Type I error rate because the nominal and the empirical significance levels (α) were statistically equal. Of these, only the Welch–Satterthwaite approximate t-test completely controlled the Type I error rates in all of the studied conditions. On the other hand, in almost all of the simulated conditions, the Welch–Satterthwaite approximate t-test was slightly more powerful than the other methods. However, in the special cases when the sample sizes were equal or when the variances were equal, the pooled t-test controlled the Type I error rates in almost all instances. Moreover, in those cases, the pooled t-test was also the most powerful method for detecting the mean differences, most of the time. The present study presents no compelling evidence to indicate that a method other than the Welch–Satterthwaite approximate t-test provides a better alternative to the Behrens–Fisher problem, except when the sample sizes are equal. Given the evidence presented in the present study, of the five methods evaluated, I recommend use of the Welch–Satterthwaite approximate t-test in cases when the samples have been obtained from normally distributed populations, when the sample sizes are unequal, and when there is uncertainty that the variances of the samples are equal. In cases when the sample sizes are equal or when the variances are equal, I recommend use of the pooled t-test.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Schwartz, Todd
  • Greene, Jeffrey
  • Houck, Eric
  • Ware, William
  • Lynn, Mary
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2016

This work has no parents.