46
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Tackling bias in the data for breast cancer prediction using machine learning-based decision support

, &
Article: e.2207919 | Received 06 Oct 2022, Accepted 21 Apr 2023, Published online: 08 May 2023
 

Abstract

In this study, a machine learning (ML)-based decision support approach was developed to identify breast cancer likelihood in patients, based on their background and physiological data. Two ML models, Naïve Bayes and Logistic Regression were used to evaluate the Breast Cancer Surveillance Consortium dataset that had about 9:1 ratio of non-cancer cases (‘Class 0’) to cancer cases (‘Class 1’). We manually built both balanced and unbalanced training datasets and a non-overlapping testing dataset using a stratified sampling method. For each model, we partitioned the prediction results on testing set into two groups, the ‘Agree’ group included cases where balanced and unbalanced ML predictions agreed, and the remaining cases come under ‘Disagree’ group. Sensitivity and Positive Predictive Value were used as the prediction performance measures. For Naïve Bayes, the sensitivity of Class 1 in regular versus ‘Agree’ group increased from 0.687 to 0.936 and for Logistic Regression, it increased from 0.358 to 0.8306. This indicates the ‘Agree’ group predictions were more accurate and could be labeled as high-confidence ML predictions. The ‘Agree’ group consisted of 89% cases in the testing set, so the improved prediction performance was applicable for a large portion of the testing dataset.

Data availability statement

The data used in this study is publicly available from https://www.bcsc-research.org/data.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Shuning Yin

Shuning Yin is a first-year Ph.D. student at Purdue University School of Engineering Technology. She received a bachelor's degree in electrical engineering technology and her master's degree in engineering technology from Purdue University.

Gaurav Nanda

Gaurav Nanda is an Assistant Professor in the School of Engineering Technology at Purdue University with focus in Industrial Engineering Technology. He works on research problems involving Applied Machine Learning, Text Mining, and Intelligent Decision Support Systems with applications in Safety, Industry 4.0, Healthcare, Learning Technologies, and other areas.

Raji Sundararajan

Raji Sundararajan is a professor in the School of Engineering Technology at Purdue University with focus in Electrical Engineering Technology. Her research interests include electrical and laser pulse-mediated chemo drug/gene (DNA) delivery (to cancer cells, organs and humans) and electrical and electronics applications to medicine, health care, and home health gadgets.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 509.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.