## Statistical inferences for outcome dependent sampling design with multivariate outcomes Public Deposited

• March 22, 2019
Creator
• Lu, Tsui-Shan
• Affiliation: Gillings School of Global Public Health, Department of Biostatistics
Abstract
• An outcome-dependent sampling (ODS) design has been shown to be a cost-effective sampling scheme. In the ODS design with a continuous outcome variable, one observes the exposure with a probability, maybe unknown, depending on the outcome. In practice, multivariate data arise in many contexts, such as longitudinal data or cluster units. While the ODS design has been an interest in statistical and applied literature, the statistical inference procedures for such design with multivariate cases still remain undeveloped. We develop a general sampling design and inference methods using the ODS under continuous multivariate settings (Multivariate-ODS). The standard estimation methods for multivariate data ignoring the Multivariate-ODS design will yield biased and inconsistent estimates. Therefore, new statistical methods are needed to reap the benefits of a Multivariate-ODS design. In this dissertation, we propose three commonly occurring ODS sampling strategies and study the new semiparametric methods for estimating regression parameters. We allow a simple random sample (SRS) in all three sampling strategies and the difference is how the supplemental samples are selected. The first design, the Multivariate-ODS with a maximum selection criterion, selects the supplemental sample based on whether the maximum value of the outcomes from an individual exceeds a known cutpoint; the second design, the Multivariate-ODS with a summation criterion, draws the supplemental sample based on whether the sums of the outcome values are above a given cutpoint; the third design, the Multivariate-ODS with a general criterion, is a more general design where the selection of the supplemental samples is based on each individual's responses, instead of on the aggregate of the outcomes. The proposed estimators are semiparametric in the sense that the underlying distributions of covariates are left unspecified and modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimators are consistent and have asymptotically normality properties. Simulation studies illustrate that the proposed estimators are more efficient than other competing estimators. We also apply the proposed methods to a real data study. The results of these applications support our claim that substantial efficiency gains can be achieved by the Multivariate-ODS design, which provides an efficient alternative to conduct multivariate studies.
Date of publication
DOI
Resource type
Rights statement