Abstract
In this study, a machine learning (ML)-based decision support approach was developed to identify breast cancer likelihood in patients, based on their background and physiological data. Two ML models, Naïve Bayes and Logistic Regression were used to evaluate the Breast Cancer Surveillance Consortium dataset that had about 9:1 ratio of non-cancer cases (‘Class 0’) to cancer cases (‘Class 1’). We manually built both balanced and unbalanced training datasets and a non-overlapping testing dataset using a stratified sampling method. For each model, we partitioned the prediction results on testing set into two groups, the ‘Agree’ group included cases where balanced and unbalanced ML predictions agreed, and the remaining cases come under ‘Disagree’ group. Sensitivity and Positive Predictive Value were used as the prediction performance measures. For Naïve Bayes, the sensitivity of Class 1 in regular versus ‘Agree’ group increased from 0.687 to 0.936 and for Logistic Regression, it increased from 0.358 to 0.8306. This indicates the ‘Agree’ group predictions were more accurate and could be labeled as high-confidence ML predictions. The ‘Agree’ group consisted of 89% cases in the testing set, so the improved prediction performance was applicable for a large portion of the testing dataset.
Data availability statement
The data used in this study is publicly available from https://www.bcsc-research.org/data.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Additional information
Notes on contributors
Shuning Yin
Shuning Yin is a first-year Ph.D. student at Purdue University School of Engineering Technology. She received a bachelor's degree in electrical engineering technology and her master's degree in engineering technology from Purdue University.
Gaurav Nanda
Gaurav Nanda is an Assistant Professor in the School of Engineering Technology at Purdue University with focus in Industrial Engineering Technology. He works on research problems involving Applied Machine Learning, Text Mining, and Intelligent Decision Support Systems with applications in Safety, Industry 4.0, Healthcare, Learning Technologies, and other areas.
Raji Sundararajan
Raji Sundararajan is a professor in the School of Engineering Technology at Purdue University with focus in Electrical Engineering Technology. Her research interests include electrical and laser pulse-mediated chemo drug/gene (DNA) delivery (to cancer cells, organs and humans) and electrical and electronics applications to medicine, health care, and home health gadgets.