Abstract
Background
Lack of body mass index (BMI) measurements limits the utility of claims data for bariatric surgery research, but pre-operative BMI may be imputed due to existence of weight-related diagnosis codes and BMI-related reimbursement requirements. We used a machine learning pipeline to create a claims-based scoring system to predict pre-operative BMI, as documented in the electronic health record (EHR), among patients undergoing a new bariatric surgery.
Methods
Using the Optum Labs Data Warehouse, containing linked de-identified claims and EHR data for commercial or Medicare Advantage enrollees, we identified adults undergoing a new bariatric surgery between January 2011 and June 2018 with a BMI measurement in linked EHR data ≤30 days before the index surgery (n=3226). We constructed predictors from claims data and applied a machine learning pipeline to create a scoring system for pre-operative BMI, the B3S3. We evaluated the B3S3 and a simple linear regression model (benchmark) in test patients whose index surgery occurred concurrent (2011–2017) or prospective (2018) to the training data.
Results
The machine learning pipeline yielded a final scoring system that included weight-related diagnosis codes, age, and number of days hospitalized and distinct drugs dispensed in the past 6 months. In concurrent test data, the B3S3 had excellent performance (R2 0.862, 95% confidence interval [CI] 0.815–0.898) and calibration. The benchmark algorithm had good performance (R2 0.750, 95% CI 0.686–0.799) and calibration but both aspects were inferior to the B3S3. Findings in prospective test data were similar.
Conclusion
The B3S3 is an accessible tool that researchers can use with claims data to obtain granular and accurate predicted values of pre-operative BMI, which may enhance confounding control and investigation of effect modification by baseline obesity levels in bariatric surgery studies utilizing claims data.
Plain Language Summary
Pre-operative BMI is an important potential confounder in comparative effectiveness studies of bariatric surgeries.
Claims data lack clinical measurements, but insurance reimbursement requirements for bariatric surgery often result in pre-operative BMI being coded in claims data.
We used a machine learning pipeline to create a model, the B3S3, to predict pre-operative BMI, as documented in the EHR, among bariatric surgery patients based on the presence of certain weight-related diagnosis codes and other patient characteristics derived from claims data.
Researchers can easily use the B3S3 with claims data to obtain granular and accurate predicted values of pre-operative BMI among bariatric surgery patients.
Abbreviations
AGB, Adjusted gastric banding; BMI, Body mass index; B3S3, BMI Before Bariatric Surgery Scoring System; CI, Confidence interval; EHR, Electronic health record; ICD, International Classification of Diseases; LASSO, Least absolute shrinkage and selection operator; MAE, Mean absolute error; MSE, Mean squared error; OLDW, Optum Labs Data Warehouse; RYGB, Roux-en-Y gastric bypass; SG, Sleeve gastrectomy.
Data Sharing Statement
The raw data that support the findings of this study are available from Optum Labs, but restrictions apply to the availability of these data, which were used under license for the current study and so are not publicly available. The materials generated for the current study are included in this published article and its Supplementary Files as well as its eSheet.
Ethics Approval and Informed Consent
This study was reviewed and approved by the Harvard Pilgrim Health Care Institutional Review Board with an exemption and waiver of individual patient consent. Exemption and waiver of individual patient consent were granted due to the use of a limited data set for this study and the execution of a data use agreement with Optum Labs to ensure specified safeguards for the information in the limited data set. All research in this study was conducted in accordance with the Declaration of Helsinki and in compliance with US data protection and privacy regulations to maintain confidentiality of patient data.
Acknowledgments
The authors would like to thank Dr. Susan Gruber for her helpful input on the analytic approaches used in this study.
Author Contributions
All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising, or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.
Disclosure
Dr Xiaojuan Li reports grants from the National Institute on Aging (K01AG073651), outside the submitted work. Dr Sengwee Toh reports personal fees from Pfizer and TriNetX, outside the submitted work. In this work, the authors declare that they have no other financial or non-financial competing interests relevant to the interpretation or presentation of information.