326
Views
1
CrossRef citations to date
0
Altmetric
ORIGINAL RESEARCH

Machine Learning Algorithm to Estimate Distant Breast Cancer Recurrence at the Population Level with Administrative Data

ORCID Icon, , , , , , , , , , , ORCID Icon, , & show all
Pages 559-568 | Received 03 Dec 2022, Accepted 01 Apr 2023, Published online: 05 May 2023
 

Abstract

Purpose

High-quality population-based cancer recurrence data are scarcely available, mainly due to complexity and cost of registration. For the first time in Belgium, we developed a tool to estimate distant recurrence after a breast cancer diagnosis at the population level, based on real-world cancer registration and administrative data.

Methods

Data on distant cancer recurrence (including progression) from patients diagnosed with breast cancer between 2009–2014 were collected from medical files at 9 Belgian centers to train, test and externally validate an algorithm (i.e., gold standard). Distant recurrence was defined as the occurrence of distant metastases between 120 days and within 10 years after the primary diagnosis, with follow-up until December 31, 2018. Data from the gold standard were linked to population-based data from the Belgian Cancer Registry (BCR) and administrative data sources. Potential features to detect recurrences in administrative data were defined based on expert opinion from breast oncologists, and subsequently selected using bootstrap aggregation. Based on the selected features, classification and regression tree (CART) analysis was performed to construct an algorithm for classifying patients as having a distant recurrence or not.

Results

A total of 2507 patients were included of whom 216 had a distant recurrence in the clinical data set. The performance of the algorithm showed sensitivity of 79.5% (95% CI 68.8–87.8%), positive predictive value (PPV) of 79.5% (95% CI 68.8–87.8%), and accuracy of 96.7% (95% CI 95.4–97.7%). The external validation resulted in a sensitivity of 84.1% (95% CI 74.4–91.3%), PPV of 84.1% (95% CI 74.4–91.3%), and an accuracy of 96.8% (95% CI 95.4–97.9%).

Conclusion

Our algorithm detected distant breast cancer recurrences with an overall good accuracy of 96.8% for patients with breast cancer, as observed in the first multi-centric external validation exercise.

Abbreviations

AUC, Area under the curve; ATC, Anatomical Therapeutic Chemical classification; AVIQ, “Agence pour une Vie de Qualité”; BCR, Belgian Cancer Registry; CA15-3, Cancer antigen 15-3; CART, Classification and regression tree; CBSS, Crossroads Bank for Social Security; CT, Computed tomography; FN, False negatives; FP, False positives; ICD, International Classification of Diseases and Related Health Problems; IMA, InterMutualistic Agency; MDT, Multidisciplinary team meeting; MRI, Magnetic Resonance Imaging; MZG, “Minimale Ziekenhuis Gegevens”; NPV, Negative predictive value; PPV, Positive predictive value; PET-CT, Positron emission tomography – computed tomography; SE, Standard error; SMN, Secondary malignant neoplasm; TN, True negatives; TP, True positives.

Data Sharing Statement

The data that support the findings of this study are available upon reasonable request. The data can be given within the secured environment of the Belgian Cancer Registry, according to its regulations, and only upon approval by the Information Security Committee.

Ethics Approval and Consent to Participate

This retrospective chart review study involving human participants was in accordance with the ethical standards of the institutional and national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. This study was approved by the Ethics Committee of University Hospitals Leuven (S60928). Informed consent for use of data of all participants was obtained.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Disclosure

The authors report no conflicts of interest in this work.

Additional information

Funding

This work was supported by VZW THINK-PINK (Belgium).