Secondary Analysis in Outcome-Dependent-Sampling Designs Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
  • Pan, Yinghao
    • Affiliation: Gillings School of Global Public Health, Department of Biostatistics
  • Outcome-dependent-sampling (ODS) schemes have long been used to reduce the cost for epidemiology studies. In ODS designs, one observes the exposure/covariates with a probability that depends on the outcome variable. Popular ODS designs include case-control for binary outcome, and case-cohort for time-to-event outcome. Most studies have multiple endpoints of interest in addition to the primary outcome. This means that investigators often need to reuse the already collected data to evaluate the association between a secondary outcome and the covariates. This is referred to as secondary analysis. However performing secondary analysis in ODS designs can be tricky as the ODS data is not representative of the general population. In this dissertation, we study how to correctly and efficiently conduct secondary analysis in ODS designs. We consider analyzing a secondary outcome in case-cohort studies. We proposed a maximum estimated likelihood approach, where the likelihood is based on jointly modeling the time-to-failure outcome and the secondary outcome. It is shown that our proposed estimated likelihood estimator has greater statistical efficiency over two inverse probability weighted type estimators. We apply our method to a data from Sister Study. In the second part of the dissertation, we investigate how to properly analyze a secondary outcome under an ODS scheme discussed in Zhou et al. (2002). In this ODS design, supplemental samples are taken from different strata of the continuous outcome variable in addition to a simple random sample. We do not make any parametric assumptions on the outcome variables, and only specify the form of the regression mean. Inverse probability weighted (IPW) and augmented inverse probability weighted (AIPW) estimating equations are proposed to conduct secondary analysis. Data from Collaborative Perinatal Project (CPP) is utilized to illustrate our method. Finally, we proposed efficient secondary analysis techniques for data from two-phase studies. The general two-phase sampling design includes case-cohort, generalized case-cohort and two-phase survival outcome-dependent sampling (SODS) as special cases. We developed a restricted maximum likelihood estimator based on the empirical likelihood function of the data. We apply our method to a data set from Norwegian Mother and Child Cohort Study (MoBa).
Date of publication
Resource type
Rights statement
  • In Copyright
  • Zeng, Donglin
  • Longnecker, Matthew
  • Cai, Jianwen
  • Zhou, Haibo
  • Herring, Amy
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2017

This work has no parents.