341
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Explainable data mining model for hyperinsulinemia diagnostics

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Article: 2325496 | Received 02 Sep 2023, Accepted 26 Feb 2024, Published online: 04 Mar 2024
 

ABSTRACT

In our research, we present a data mining model for the early diagnosis of hyperinsulinemia, potentially reducing the risk of diabetes, heart disease, and other chronic conditions. The dataset, gathered from 2019 to 2022 by Serbia's Healthcare Center through an observational cross-sectional study, includes 1008 adolescents. Medical datasets are often highly imbalanced and may contain irrelevant features that hinder predictive performance. To address these challenges in the medical data analysis, we propose a model employing Functional Principal Component Analysis (FPCA), which also accounts for outliers that could otherwise lead to the inclusion of irrelevant features. Unlike standard Principal Component Analysis (PCA), which is sensitive to the initial positions of cluster centers influencing the final outcome, our model integrates FPCA with K-Means clustering to improve the preprocessing stage. Additionally, we have incorporated the post-hoc explanatory method SHAP (SHapley Additive exPlanations) alongside algorithms such as Random Forest, XGBoost, and LightGBM to provide deeper insights into our model, identifying the most contributory features for the development of hyperinsulinemia. Experimental results showed that combining FPCA with K-Means clustering enhances the accuracy of the XGBoost classifier, with this model achieving an accuracy score of 0.99.

Data and code availability statement

The raw dataset used for this study is under a Non-Disclosure Agreement (NDA) and is therefore not available to the public. The code for the presented data mining model could be available upon reasonable request.

Acknowledgements

The authors would like to thank freepik.com website (accessible through the following link: https://www.freepik.com/) for providing free license for parts of the . Hyperinsulinemia process. Muscle picture as a part of . was designed by brgfx/Freepik. Pancreas picture as a part of . was designed by macrovector/Freepik. Liver picture as a part of . was designed by Freepik. Circulation or blood stream picture as a part of . was designed by Freepik. Adipose tissue picture as a part of was designed by Freepik.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.