Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
  • Cai, Tianji
    • Affiliation: College of Arts and Sciences, Department of Sociology
  • When a sampling design is correlated to the dependent variable, then the distribution of the sampled units is different from that obtained from a simple random sampling design. Then the sampling design is informative, in the sense that if the design variables were not included in the analysis model, even conditional on the covariates, the estimated model parameters can be biased. Questions have been asked about how survey data are modeled when sampling designs are informative. Two fundamental methodologies, design-based and model-based, have been proposed to address this issue. A model-based method--so-called sample distribution method, has been proposed by Krieger and Pfeffermann (1992; 1997) to extract the model of the sample data as a function of the model holding in the population and the sampling design. Once the model holding in the sample data is derived, the standard model-based analysis techniques can be applied to estimate the unknown population parameters. The core topic of this dissertation is to assess various modeling strategies and estimators of regression coefficients and their variance--both design-based and model-based, in particular, the sample distribution method, under the informative sampling design, and to develop a modeling strategy for analysts who are facing this design-based or model-based dilemma. The dissertation is comprised of three research papers that provide 1) an evaluation of the design-based and model-based estimators under a single-stage informative sampling design; 2) an assessment of design-based and model-based estimators under an informative two-stage clustering sampling design; 3) a joint treatment of informative sampling and unit dropouts in longitudinal studies. When a single-stage sampling design is informative, the model-based naïve method--either ordinary least square or maximum likelihood, produces biased results. The design-based method reduces the amount of biases for some parameters (e.g. intercept) but increases variances, which may lead to too conservative conclusions. The sample distribution method produces better estimates in the term of having smaller biases and variances than the naïve and design-based methods. Under an informative two-stage clustering sampling design, ignoring the sampling effect, the model-based naïve method produces biased results. Under some specific assumptions, , the sample distribution method produces better estimators in terms of smaller biases and higher coverage rates compared to the naïve method and the design-based multilevel pseudo likelihood method. Although many previous studies have shown that multilevel pseudo likelihood method is preferred to compensate for the sampling design, this study shows that a rather simpler method--the sample distribution method can be used to address the design effect. In a specific statistical setting, the relative performance of the design-based and the model-based methods for compensating the informative sampling design and dropout has been investigated. The simulation results indicate that both the model-based and the design-based approaches generally work well in the missing at random and missing not at random settings. Moreover, the sample distribution method combined with the Diggle and Kenward model has advantages of correcting the design effect and the nonignorable dropout.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Guo, Guang
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2010
  • This item is restricted from public view for 1 year after publication.

This work has no parents.