Methods and approaches for evaluating the validity of latent class models with applications Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
  • Berzofsky, Marcus
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • This dissertation focuses on methods for assessing classification error for sensitive, categorical outcomes by using latent class analysis (LCA) and Markov latent class analysis (MLCA). The ability to quantify classification error in a survey is critical to understanding the quality of its estimates. Classification error (measurement error for categorical outcomes) is defined as the difference between the true value of a measurement and the value obtained during the measurement process. For dichotomous outcomes there are two types of classification errors: a false positive (i.e., response is affirmative when the negative is correct) and a false negative (i.e., response is negative when the affirmative is correct). A sensitive outcome is an event for which respondents have a negligible probability of providing false positive responses. Such events are very important in the study of socially undesirable phenomenon (e.g., alcohol abuse, sexual misconduct, or drug abuse). LCA, for cross-sectional data, and MLCA, for panel or longitudinal data, are modeling techniques that use repeated measurements rather than an error-free estimate to estimate the true prevalence of an outcome and its corresponding classification error. The dissertation is split into three parts. First, I assess the impact of local dependence, the key assumption in LCA, on classification error estimates. I use simulation to determine the impact of local dependence. Then, I develop an approach to correct for local dependence during the modeling process and apply my process to data from the National Inmate Survey. Second, I determine if there is a more parsimonious way to incorporate time varying grouping variables (variables that do not change in a linear fashion over time) in an MLC model. I develop a process to test time-invariant summary variables and determine if model fit is not impacted. I then determine if time-invariant summary variables are appropriate for the National Crime Victimization Survey (NCVS). Third, I estimate the classification error rates for the NCVS. To achieve this, I develop a process to ensure that model estimates meet all assumptions. I found that the NCVS has a large amount of classification error and published estimates are negatively biased by 2.5%.
Date of publication
Resource type
Rights statement
  • In Copyright
  • "... in partial fulfillment of the requirements for the degree of Doctor of Public Health in the School of Public Health (Biostatistics)."
  • Biemer, Paul P.
Place of publication
  • Chapel Hill, NC
  • Open access

This work has no parents.