Distinguishing Death from Disenrollment in Claims Data Using a Readily Implemented Machine Learning Algorithm

Young, Jessica; Yoon, Frank; Dasgupta, Nabarun; Irwin, Debra; Pack, Kenneth; Cooper, Toska; Shiv, Shalu; Bloemers, Sarah; Gibson, Teresa

Download PDF

Request Version for Screen Reader

Last Modified

October 8, 2021

Creator

Young, Jessica
- Affiliation: University of North Carolina at Chapel Hill
Yoon, Frank
- Other Affiliation: IBM Watson Health
Dasgupta, Nabarun
- Affiliation: Injury Prevention Research Center
Irwin, Debra
- Other Affiliation: IBM Watson Health
Pack, Kenneth
- Other Affiliation: IBM Watson Health
Cooper, Toska
- Affiliation: Injury Prevention Research Center
Shiv, Shalu
- Other Affiliation: IBM Watson Health
Bloemers, Sarah
- Other Affiliation: IBM Watson Health
Gibson, Teresa
- Other Affiliation: IBM Watson Health

Abstract

Background: The inability to identify dates of death in insurance claims data is a major limitation to retrospective claims based research. If not an outcome, death is a competing risk and poses a threat to validity when treated as non-informative right censoring. Objectives: We aim to develop a user-friendly public algorithm to predict death within the year of disenrollment using an administrative claims database. Methods: We identified adults (18+ years) with at least 2 years of continuous enrollment prior to disenrollment between 01/2007 and 01/2018. Leveraging unique linkages in addition to data that are typically unavailable in the publicly licensed data, we ascertained date of death from the Social Security Death Index, inpatient discharge status, and death indicators in the administrative data. Models including candidate predictors for age, sex, Census region, month of disenrollment, year of disenrollment, chronic condition indicators (components of the Elixhauser score), and prior healthcare utilization were estimated using used elastic net regression tuned by 5-fold cross-validation and final models evaluated in an independent testing set. Weighted analysis adjusts for rare outcome (i.e., class imbalance). Sensitivity, specificity, and ROC associated with various thresholds of predicted probability to classify death at disenrollment were calculated. Results: Overall, we identified 13,360,460 beneficiaries who disenrolled during the study period, with 5% of patients who died within the year of disenrollment. The strongest predictors of death were age at disenrollment, diagnosis of metastatic cancer in the year prior to death, and type of care received (e.g., inpatient stay, hospice care). Using a prediction threshold of 30%, the algorithm classified death at disenrollment with a sensitivity of 0.684 and specificity of 0.985 (ROC=0.97. At the same prediction threshold, the weighted algorithm classified death with a sensitivity of .947 and a specificity of 0.898 (ROC=.973). Conclusions: Our algorithm uses publicly defined chronic conditions and utilization patterns that are easy to implement in claims data and predicts death at disenrollment with high specificity and varying sensitivity depending on the chosen prediction threshold. Users can easily implement the algorithm and can choose the prediction threshold (balancing sensitivity and specificity) to meet the needs of the specific study at hand.

Date of publication

August 23, 2021

DOI

https://doi.org/10.17615/42zz-g634

Rights statement

In Copyright

Language

English

Relations

Parents:

This work has no parents.

In Collection:

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	DisenrollmentDeath_ICPE.pdf	2021-10-07	Public	Download

Distinguishing Death from Disenrollment in Claims Data Using a Readily Implemented Machine Learning Algorithm

Downloadable Content

Relations

Items