Distinguishing Death from Disenrollment in Claims Data Using a Readily Implemented Machine Learning Algorithm Public Deposited

Downloadable Content

Download PDF
Last Modified
  • October 8, 2021
  • Young, Jessica
    • Affiliation: University of North Carolina at Chapel Hill
  • Yoon, Frank
    • Other Affiliation: IBM Watson Health
  • Dasgupta, Nabarun
    • Affiliation: Injury Prevention Research Center
  • Irwin, Debra
    • Other Affiliation: IBM Watson Health
  • Pack, Kenneth
    • Other Affiliation: IBM Watson Health
  • Cooper, Toska
    • Affiliation: Injury Prevention Research Center
  • Shiv, Shalu
    • Other Affiliation: IBM Watson Health
  • Bloemers, Sarah
    • Other Affiliation: IBM Watson Health
  • Gibson, Teresa
    • Other Affiliation: IBM Watson Health
  • Background: The inability to identify dates of death in insurance claims data is a major limitation to retrospective claims based research. If not an outcome, death is a competing risk and poses a threat to validity when treated as non-informative right censoring. Objectives: We aim to develop a user-friendly public algorithm to predict death within the year of disenrollment using an administrative claims database. Methods: We identified adults (18+ years) with at least 2 years of continuous enrollment prior to disenrollment between 01/2007 and 01/2018. Leveraging unique linkages in addition to data that are typically unavailable in the publicly licensed data, we ascertained date of death from the Social Security Death Index, inpatient discharge status, and death indicators in the administrative data. Models including candidate predictors for age, sex, Census region, month of disenrollment, year of disenrollment, chronic condition indicators (components of the Elixhauser score), and prior healthcare utilization were estimated using used elastic net regression tuned by 5-fold cross-validation and final models evaluated in an independent testing set. Weighted analysis adjusts for rare outcome (i.e., class imbalance). Sensitivity, specificity, and ROC associated with various thresholds of predicted probability to classify death at disenrollment were calculated. Results: Overall, we identified 13,360,460 beneficiaries who disenrolled during the study period, with 5% of patients who died within the year of disenrollment. The strongest predictors of death were age at disenrollment, diagnosis of metastatic cancer in the year prior to death, and type of care received (e.g., inpatient stay, hospice care). Using a prediction threshold of 30%, the algorithm classified death at disenrollment with a sensitivity of 0.684 and specificity of 0.985 (ROC=0.97. At the same prediction threshold, the weighted algorithm classified death with a sensitivity of .947 and a specificity of 0.898 (ROC=.973). Conclusions: Our algorithm uses publicly defined chronic conditions and utilization patterns that are easy to implement in claims data and predicts death at disenrollment with high specificity and varying sensitivity depending on the chosen prediction threshold. Users can easily implement the algorithm and can choose the prediction threshold (balancing sensitivity and specificity) to meet the needs of the specific study at hand.
Date of publication
Rights statement
  • In Copyright

This work has no parents.

In Collection: