456
Views
0
CrossRef citations to date
0
Altmetric
Computer Science

Explainable artificial intelligence-driven gestational diabetes mellitus prediction using clinical and laboratory markers

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, , & ORCID Icon show all
Article: 2330266 | Received 17 Aug 2023, Accepted 08 Mar 2024, Published online: 26 Mar 2024

Abstract

Gestational diabetes is characterized by hyperglycemia diagnosed during pregnancy. High blood sugar levels are likely to affect both the mother and child. This disease frequently goes undiagnosed due to its fewer prominent symptoms, resulting in severe unmanaged hyperglycemia, obesity, childbirth complications and overt diabetes. Artificial Intelligence is increasingly deployed in the medical field, revolutionizing and automating data processing and decision-making. Machine learning is a subset of artificial intelligence that can create reliable healthcare screening and predictive systems. With the advent of machine learning, detecting gestational diabetes and getting more profound insights about the disease is possible. This study explores the development of a reliable clinical decision support system for gestational diabetes detection using multiple machine learning architectures using combinations of five data balancing methods to detect gestational diabetes. An ensemble stack trained on the synthetic minority oversampling technique with edited nearest neighbor obtained the highest performance with accuracy, sensitivity and precision of 96%, 95% and 99%, respectively. Additionally, a layer of explainable artificial intelligence was added to the best-performing model using libraries such as SHapley Additive exPlanations, Local Interpretable Model-agnostic Explanations, Quantum lattice, Explain Like I’m 5 algorithm, Anchor and Feature importance. The importance of factors such as Visceral Adipose Deposit and its contribution toward the prediction of gestational diabetes is explored. This research aims to provide a meaningful and interpretable clinical decision support system to aid healthcare professionals in early gestational diabetes detection and improved patient management.

1. Introduction

Gestational diabetes or gestational diabetes mellitus (GDM) is when a pregnant woman shows diabetes-like symptoms of insulin resistance and elevated glucose levels without diabetes before gestation. During late pregnancy, the inability of the body to produce enough insulin is often linked to obesity (McIntyre et al., Citation2019). Research suggests genetics plays a significant role in the development of GDM (Shaat & Groop, Citation2007). If left unmanaged or undiagnosed, it can risk the health of both the mother and the child. The mother is likely to suffer from high blood pressure, pre-eclampsia and long-term adverse outcomes like chronic type-2 diabetes (Damm et al., Citation2016). The baby is at risk of developing macrosomia, which could lead to labor complications. The child’s pancreas may contain excessive insulin, causing low blood sugar. Some studies indicate that GDM could delay brain maturity and cause long-term nervous system abnormalities (Perna et al., Citation2015). In most cases, GDM is treatable. However, GDM does not show any discernible external symptoms. Blood sugar testing during pregnancy is primarily used for diagnosing GDM. However, large amounts of visceral abdominal fat and high blood pressure are associated with insulin resistance and hyperglycemia (Alwash et al., Citation2021).

The healthcare industry has been a pioneer in implementing new technologies. Machine learning (ML), a subset of Artificial Intelligence, is pivotal in medical advancements. With its ability to handle and process complex medical data, ML is deployed to predict and treat diseases, develop and deliver new drugs, manage and organize patient medical records and develop clinical decision support systems to screen patients (Kaur et al., Citation2018; Khanna, Chadaga, Sampathila, Chadaga, et al., Citation2023; Khanna, Chadaga, Sampathila, Prabhu, et al., Citation2023; Rodrigues & Bernardes, Citation2020; Santosh & Gaur, Citation2021).

ML can weigh multiple parameters at once to make reliable and holistic predictions. This screening capability has led to multiple studies that used ML to detect gestational diabetes. Besides the primary blood sugar and blood pressure tests, numerous parameters have been considered. Xiong et al. (Citation2022) developed a GDM prediction system for patients in their first 19 weeks of pregnancy. Four hundred ninety pregnant women were considered, among which 215 had GDM. Forty-three test parameters consisted of bloodwork and hepatic and renal markers. Support vector machine (SVM) and light gradient boosting machine (LGBM) were used to build two pipelines for predicting GDM. It was observed that the models trained with thrombin parameters gave a higher area under the receiver operating curve (AUC) score of 0.942 and 0.98 for Light GBM and SVM, respectively. This indicated the significance of thrombin levels in the prediction of GDM. Du et al. (Citation2022) created an explainable ML prediction system for GDM. They gathered a dataset of 565 pregnant women from the National Maternity Hospital in Dublin, Ireland. Redundant features were extracted using Pearson correlation. The data were balanced by the synthetic minority oversampling technique (SMOTE). Models like Logistic Regression (LR), Random Forest (RF), SVM, Adaptive Boosting and Extreme Gradient Boosting (XGBoost) were trained. They created multiple pipelines considering various subsets of the data. SVM outperformed all other models with the highest accuracy of 76.1%.

Furthermore, SHapley Additive exPlanations (SHAP) were used to explain the model predictions. Gnanadass (Citation2020) proposed an ML GDM prediction model. The dataset was obtained from the Polyisocyanurate Insulation Manufacturers Association (PIMA), consisting of 768 samples with 268 GDM-positive cases. The dataset considered eight parameters. ML classifiers were trained and evaluated. XGBoost achieved the highest accuracy and AUC score of 77.54% and 0.99, respectively.

An ML-based risk prediction score architecture was proposed by Liu et al. (Citation2021). A cohort of 19,331 women from Tianjin, China, was selected for the study. Feature importance with XGBoost assisted in identifying significant features. XGBoost produced a higher AUC score of 0.742. An early prediction system based on the elemental contents of the fingernails and urine samples was developed by Chan et al. (Citation2023). A total of 67 women were chosen, and 33 were GDM-positive patients. The concentrations of various elements in the fingernails and urine samples were analyzed. These data were then used to train multiple ML models such as naïve Bayes, SVM, K-nearest neighbor (KNN), discriminant analysis and ensemble models. The ensemble modeling using a random subspace algorithm outperformed the remaining algorithms with the highest AUC score of 0.81 when trained on the fingernail elements of nickel, copper and selenium.

From the above studies, ML is effective in diagnosing GDM. Multiple parameters, such as the child’s birthweight and the maternal central visceral fat, have also been associated with GDM (Alwash et al., Citation2021; McIntyre et al., Citation2019). However, these parameters have not been extensively explored in recent studies. Often with health data, there is a drastic imbalance in the dependent class. In this dataset, 18 out of 133 samples had gestational diabetes. Data balancing is necessary to improve the accuracy and reliability of ML models when the target class is unevenly distributed. When the target class is unevenly distributed, the model is more likely to learn the majority class and ignore the minority class. This can lead to inaccurate and unreliable results, especially for the minority class. Data balancing techniques can be used to address this issue by oversampling the minority class or undersampling the majority class. This helps to create a more balanced dataset, which can lead to more accurate and reliable results. In this study, we have developed and evaluated multiple ML pipelines using various combinations of five data balancing techniques and 12 classifiers. The best ML classifiers are then interpreted using explainable artificial intelligence (XAI) tools such as SHAP, Local Interpretable Model-agnostic Explanations (LIME), Qlattice, Explain Like I’m 5 (ELI5) and Anchor. XAI tools such as ELI5, QLattice and Anchor have been rarely used in medical research. Furthermore, XAI predictions are validated by the feature importance obtained from tree-based classifiers. XAI methods are being widely deployed to ensure end-users understand the reasoning behind a prediction. When used effectively, XAI can assist in confirming prior knowledge, challenging existing knowledge, and producing new theories for a problem. These model interpretation tools have further been deployed to validate the significance of visceral adipose deposit and weight of the child at birth and its relation with GDM.

The contributions of this study are as follows:

  • Five data balancing techniques have been used and compared. They are SMOTE, Borderline SMOTE, Adaptive Synthetic Algorithm (ADASYN), SMOTE-Tomek and SMOTE-ENN.

  • Thirteen ML classifiers such as LR, Decision Tree (DT), RF, SVM, naïve Bayes, KNN, AdaBoost, XGBoost, Extratrees, Light GBM, CatBoost, Voting Classifier and an ensemble stack have been trained and evaluated.

  • The results obtained by the best ML models have been interpreted using XAI techniques such as SHAP, LIME, Qlattice, ELI5, Anchor and feature importance. There have yet to be studies as of now that utilize five heterogenous XAI techniques to interpret GDM predictions.

  • Discussions regarding the medical significance of the features and research findings are presented.

The upcoming sections encompass the following: Section 2 demonstrates the process flow adopted to create multiple ML pipelines. Section 3 elaborates on the results obtained by comparison of the architectures. Additionally, the evaluation of models and observations from the XAI tools are also presented in this section. The final section concludes the research and provides a future scope for this work. represents the workflow of the research.

Figure 1. Workflow of the research.

Figure 1. Workflow of the research.

2. Method and methodology

2.1. Data description

This study employed an open-source dataset from Kaggle uploaded by Kamyab Abedi (Kaggle, Citation2023). The dataset consisted of 133 pregnant women (in their first 20 weeks of pregnancy) with 14 variables. This dataset was part of a study conducted to predict GDM during delivery using the measurement of the visceral adipose tissue. Eighteen out of the 133 women were diagnosed GDM-positive. represents the data description of each variable. Many traditional parameters have been considered, such as first fasting glucose, blood pressure, patient body mass index (BMI) and gestational age. However, certain variables, such as maternal visceral adipose deposits and the child’s weight at birth, have also been included.

Table 1. Data description table.

2.2. Data pre-processing

Data pre-processing is an essential step before training the models. This step ensures that data inconsistencies are handled before the execution of the algorithms. Python libraries like Pandas, NumPy, SciPy and Scikit-learn assisted in data pre-processing. depicts the statistical description and the missing values in the dataset. It can be observed that variables like ‘first fasting glucose (mg/dl)’ and ‘pregnancies (number)’ have multiple missing values. The median of the feature was used to replace the missing values. Replacing the missing values with the mean could lead to wrong imputations, as the statistical mean is sensitive to outliers. The ‘Number’ parameter has no statistical importance and was removed from the final dataset. illustrates a heatmap of the correlation between the selected features. The plot represents the relationship between various features by calculating their statistical correlation. It can be observed that visceral fat is a variable highly correlated with the diagnosis of GDM.

Figure 2. Correlation heatmap.

Figure 2. Correlation heatmap.

Table 2. Statistical description of the parameters.

During the data description, it was observed that there was a significant imbalance between the GDM-positive and -negative cases. Eighteen out of the 133 samples had been diagnosed as GDM-positive. This uneven distribution in the target class leads to inaccurate and unreliable results. It is necessary to conduct data balancing to alleviate this issue. Two widely used methods are data oversampling and undersampling. Undersampling is known to cause loss of data by deletion of data samples (Gong et al., Citation2019; Peng & Cheng, Citation2020). Hence, the oversampling approach was adopted to balance the target variable. Five different data balancing techniques were explored to balance the train data. Only the training data were balanced to avoid bias in the model evaluation. All methods had distinct mathematical algorithms that created synthetic data points by oversampling the existing data samples. represents a comparison between the effects of various balancing techniques on the target class.

Figure 3. Comparison of class count of the target variable with five different balancing techniques and the vertical axis represents the number of cases in each class.

Figure 3. Comparison of class count of the target variable with five different balancing techniques and the vertical axis represents the number of cases in each class.
  • Synthetic Minority Oversampling (SMOTE): Oversampling with SMOTE is achieved by duplicating data samples from the minority class. This method can balance the data without providing additional information to the model (Koto, Citation2014).

  • Adaptive Synthetic Algorithm (ADASYN): This balancing technique adaptively generates minority samples based on their local density distribution using KNN. In this algorithm, the weights of different minority samples are adaptively changed to compensate for the skewed data distribution. ADASYN assists in improving the classifier training by reducing the bias introduced by the data imbalance (Nasarian et al., Citation2020).

  • Borderline Synthetic Minority Oversampling (Borderline SMOTE): This technique extends SMOTE. When an outlying minority sample in the majority class exists, SMOTE creates a line bridge of synthetic samples. Borderline SMOTE solves this disadvantage of SMOTE. It classifies any outlying minority data sample as noise and ignores such points while creating synthetic data (Han et al., Citation2005).

  • Synthetic Minority Oversampling with Tomek links (SMOTE-Tomek): This method combines the SMOTE technique of generating synthetic samples and the ability of Tomek links to be identified and removed from the majority class. On observing the nearest neighbor of two points, if these two data points belong to two different classes (minority and majority), these points form a Tomek link. Removing Tomek link points ensures the increased separation between the two classes for better classifier training (Jonathan et al., Citation2020).

  • Synthetic Minority Oversampling using Edited Nearest Neighbor (SMOTE-ENN): This method integrates the power of SMOTE to produce synthetic samples and the deletion approach ENN uses. When considering the principle of ENN deletion, if a data sample has most of its KNN in a different class, then such observations and their KNN are deleted. This method is considered superior to SMOTE-Tomek because it can delete all the data sample’s K-nearest sample points rather than just one point. With SMOTE-ENN, better separation of the two classes is achieved, which can improve model training (Muntasir Nishat et al., Citation2022).

This dataset has a smaller number of samples. In such cases, the ML models tend to overfit. K-fold cross-validation was used to avoid potential overfitting, where five folds of the data were considered for training and testing (Anguita et al., Citation2012). Furthermore, data standardization was performed as a few ML algorithms required scaling. Standard scalar was deployed over min–max scaling. In min–max scaling, all data are scaled between 0 and 1. However, standard scalar features are scaled based on their standard normal distribution properties. Such standardization makes the classifiers less sensitive to outliers (Ferreira et al., Citation2019).

2.3. Model training

Several supervised ML classifiers were trained on five sets of pipelines. The LR model is known for providing simple parametric binary classification. When considering non-parametric techniques such as the KNN algorithm, it classifies newer samples based on the similarity of the earlier trained data points. SVM is a widely used classifier. Based on the SVM kernel, it can be a more complex algorithm trained to fit the data better (Anguita et al., Citation2010). In this study, we have used the polynomial kernel of SVM. During literature survey, it was observed that tree-based models tend to obtain better results (Zhang et al., Citation2022). Hence, DT, RF, XGBoost, AdaBoost, Extratrees, LightGBM and CatBoost were trained. Furthermore, we created a custom stack considering outputs from all mentioned ML classifiers. The meta-learner for the custom stack was XGBoost. When individual models are biased to a particular feature, deploying models like a voting classifier is suggested. We deployed both the soft and hard voting classifiers. The Soft Voting Classifier considers the probabilities of various classifier predictions. In contrast, the hard voting classifier makes a prediction based on the mode of all the predictions made by individual models (de Oliveira et al., Citation2022). depicts the design of the stack and voting classifier. Furthermore, Hyperparameter tuning to obtain the best parameters of each ML model was performed using the GridSearchCV algorithm (Ranjan et al., Citation2019).

Figure 4. Architecture of the following: (a) custom stack and (b) voting classifier.

Figure 4. Architecture of the following: (a) custom stack and (b) voting classifier.

2.4. Performance metrics

Performance metrics such as accuracy, precision, recall, F1 score, AUC score, Jaccard score, Matthews’s correlation coefficient (MCC), log loss and Hamming loss were utilized to evaluate and compare the efficacy of the classifiers. Most metrics consist of the comparison between the actual and predicted values. Predictions are mainly divided into four categories: true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN; Kaushik et al., Citation2023). Accuracy is a generalized metric for considering the overall performance of a classifier. Precision indicates a measure of the quality of positive predictions made by the classifier. This metric elaborates on the ability to detect actual GDM cases among all predictions. A model having no false positives will have a precision of 100%. Recall or Sensitivity quantifies the number of actual positive predictions among all positive predictions. Ideally, a model with no false negatives will obtain a recall of 100% (Junker et al., Citation1999). Jaccard similarity score measures the similarity between two sets of data points. This score compares the prediction labels with the corresponding ground truth labels (Thekadayil, Citation2018). The MCC measures the association of two binary variables. This coefficient gauges the difference between the predictions and the actual values. This metric is primarily used when there exists a case of imbalanced classes. A higher MCC indicates good results (Chicco & Jurman, Citation2020).

Furthermore, loss scores like Hamming and log loss were estimated for each model. In contrast to the abovementioned metrics, the lower the loss score, the better the model performance (Butucea et al., Citation2018; No, Citation2019). Equations of all considered performance metrics are stated in EquationEquations (1)–(8). The receiver operator characteristic curve is a performance evaluation metric. This is a probability curve having the true positive rate on the y-axis against the false positive rate on the x-axis. This plot shows the performance of the model at all classification thresholds. The area under the curve (AUC) depicts the model’s capability to distinguish between two binary classes. An AUC score close to 1 is considered to be a high-performing model. (1) Accuracy=Correct predictionsTotal predictions made=TP+TNTP+TN+FP+FN(1) (2) Precision=Correct positive predictionsTotal positive prediction made=TPTP+FP(2) (3) Recall=Correct positive predictionsTotal actual positive predictions=TPTP+FN(3) (4) F1 score=2×Precision*RecallPrecision+Recall=TPTP+12(FP+FN)(4) (5) Jaccard Score=J(C1,C2)=|C1C2||C1C2|(5) where C1 and C2 are two sets of data. (6) MCC=TP×TNFP×FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)(6) (7) Hamming Loss=1nLi=1nj=1L[Iyj(i)ŷj(i)](7) where n = number of training samples. yj(i)=True labels for the ith training samples in jth class ŷj(i)=Predicted labels for the ith training examples in jth class (8)  Hp(q)=1Ni=1Nyi×log(p(yi))+(1yi)×log(1p(yi))(8) where yi represents the actual class.

3. Results and discussion

This section presents a detailed analysis of the performance of each model. Furthermore, explainable AI techniques such as SHAP, LIME, ELI5, Qlattice and Anchor are deployed on the best-performing architectures. Feature importance for each tree-based model is then used to validate the mentioned XAI techniques.

3.1. Model evaluation

Five sets of pipelines consisting of five different data balancing techniques were created. Twelve ML models were trained on each set of balanced data. The accuracy comparison of all data balancing pipelines and their effect on the classifiers is represented in . It was observed that the pipelines containing models trained on SMOTE-ENN data had a considerably higher performance than other data balancing methods. The traditionally used SMOTE technique had the majority of the lowest model accuracies. Classifiers like DT, SVM, AdaBoost, CatBoost, STACK, and both the voting classifiers had the lowest accuracies when trained with SMOTE data. On comparing the performance of models trained on ADASYN data with SMOTE pipelines, models such as DT, SVM, Adaboost, Extratrees, Light GBM, CatBoost, STACK and Hard voting classifier showed improved performance. However, the KNN obtained the lowest accuracy in the ADASYN pipeline. Results of the Borderline SMOTE pipeline were comparable to ADASYN. STACK model and the Soft Voting Classifier had increased scores in Borderline SMOTE compared to the previous two techniques. SMOTE-Tomek pipelines had higher scores than SMOTE, ADASYN and Borderline SMOTE. The principle of Tomek links greatly benefited the Soft Voting Classifier and Adaboost, producing 95% and 93% accuracy, respectively. SMOTE-ENN was the best-performing data balancing technique. Improved performance in LR, DT, KNN, CatBoost, Hard Voting classifier and the STACK was observed. Tree-based models like XGBoost, CatBoost and RF performed comparably well across different balancing techniques.

Table 3. Comparison of the model accuracies obtained from different data balancing pipelines.

SMOTE-ENN pipeline had the majority of the best-performing classifiers. provides a detailed view of individual performance metrics for each classifier trained by SMOTE-ENN data. Multiple models provided a 99% recall score. DT, KNN and STACK had the highest precision of 99%. KNN, CatBoost and STACK provided an F1 score of 97%. Notably, XGBoost and the voting classifiers had similar results. While evaluating the best-performing classifier, it was observed that CatBoost and STACK had the best results. Both these classifiers had the same accuracies and F1 scores; however, STACK had a higher AUC of 0.96 and Matthews’s coefficient of 0.9. Hence, it was concluded that the SMOTE-ENN with STACK pipeline outperformed all other architectures. depicts AUC visualizations for four classifiers: RF, CatBoost, STACK and the Soft Voting Classifier. It can be observed that the RF had the highest AUC score of 0.98. Both CatBoost and STACK had the same score of 0.95. Furthermore, the Soft Voting Classifier achieved an AUC score of 0.93.

Figure 5. AUC curves for four classifiers trained on SMOTE + ENN data. Plot: (a) the AUC plot for Random Forest, (b) the AUC plot for CatBoost, (c) the AUC plot of the STACK model and (d) the AUC plot for the Soft Voting Classifier.

Figure 5. AUC curves for four classifiers trained on SMOTE + ENN data. Plot: (a) the AUC plot for Random Forest, (b) the AUC plot for CatBoost, (c) the AUC plot of the STACK model and (d) the AUC plot for the Soft Voting Classifier.

Table 4. Performance comparison of classifiers trained on SMOTE-ENN data.

3.2. Explaining ML models

Explainable AI or XAI consists of tools and architectures that aim to increase ML classifier transparency and give meaning to the rationale behind a specific prediction. These tools are widely used to demystify the Black-Box of Artificial Intelligence. XAI techniques are crucial to building trust and confidence to accept predictions made by a decision support system. Furthermore, XAI tools can assist in classifier debugging. This research deployed tools like SHAP, LIME, ELI5, Quantum lattice and Anchor.

3.2.1. SHapley Additive exPlanations

SHAP is a mathematical tool to explain ML models proposed by Lundberg and Lee (Citation2017). This method is based on the principles of game theory by which the contribution of each feature to the prediction can be calculated employing SHAP values. Based on the SHAP algorithm, it can provide both global and local explanations for a model. The STACK trained on SMOTE-ENN had the highest performance among all architectures. Therefore, SHAP was used to explain the STACK ensemble utilizing three widely used SHAP plots: the beeswarm, waterfall and force plot (Santosh & Gaur, Citation2021).

  • SHAP Beeswarm plot: The beeswarm plot is used to depict an information-dense summary regarding the contribution of each feature to the prediction (Chadaga et al., Citation2023; Khanna, Chadaga, Sampathila, Prabhu, et al., Citation2023). illustrates the SHAP beeswarm plot. Each point on this plot represents an instance of the dataset. The color gradients depict the feature value. The X-axis represents the SHAP values. All considered features are ranked from top to bottom based on their significance on the target variable. SHAP identified that the topmost feature considered by the STACK in GDM prediction was ‘central armellini fat (mm)’. A positive relationship between the visceral fat and the SHAP values can be seen – a higher fat deposit correlated to a positive SHAP value. In a binary classification problem, as considered in this study, a positive SHAP value indicates a prediction of 1 (GDM-positive). Similarly, a positive relationship is observed between values of features like the child’s birthweight and the GDM prediction. Furthermore, higher fasting glucose levels positively contribute to a prediction of 1. The insights observed in the SHAP beeswarm plot coincide with research diagnosing GDM. The weight of the offspring of a GDM mother is often more than normal (Alwash et al., Citation2021). Furthermore, such patients’ fasting glucose and visceral fat are higher than normal (Perna et al., Citation2015).

  • SHAP Waterfall plot: SHAP waterfall deconstructs a classifier prediction into individual contributions of the features. The waterfall plot provides a local explanation for one patient. The shift from expected values is depicted in this plot. The right shifts contribute to the GDM-positive prediction (Khanna, Chadaga, Sampathila, Chadaga, et al., Citation2023). The plot in is an instance when STACK predicts the patient as GDM-positive. Similar to the beeswarm plot, the features on the top contribute the most to a prediction. In this plot, the classifier identified a positive GDM prediction associated with high levels of first fasting glucose, maternal visceral fat and higher child weight at birth. Furthermore, this patient had a higher BMI. The classifier suggests all these features have majorly contributed and are correlated to this patient being predicted GDM-positive. The explanation provided by this plot coincides with the GDM diagnosis research (Alwash et al., Citation2021).

  • SHAP Force plot: The Force plot is another powerful SHAP visualization that assists in understanding local interpretations of the classifiers. Each significant feature can be seen affecting the output. represents the force plot where STACK classified the patient as GDM-negative. The bar size dedicated to each feature represents its contribution to the classifier prediction. Features impacting the output are closer to the force plot dividing boundary (Khanna, Chadaga, Sampathila, Chadaga, et al., Citation2023). The red indicates the feature that contributes to a prediction of 1, and all features in blue contribute to a prediction of 0. In , most features are seen in blue, indicating that the patient is likelier to be screened as negative. This patient has high visceral fat. However, the overall contribution of the remaining features, such as BMI and blood pressure, provided a negative prediction.

Figure 6. SHAP beeswarm plot.

Figure 6. SHAP beeswarm plot.

Figure 7. SHAP local waterfall plot.

Figure 7. SHAP local waterfall plot.

Figure 8. SHAP local force plot.

Figure 8. SHAP local force plot.

3.2.2. Local Interpretable Model-agnostic Explanations

LIME is an XAI technique that provides local interpretations of ML predictions made for each instance. This tool provides insights into the effect of data variations on the predictions obtained by ML models. This method was developed by Ribeiro et al. (Citation2016). LIME aims to reveal the inner workings of black-box prediction systems. This technique could assist in identifying issues related to information leakage, model robustness and bias. LIME offers a high-level logical explanation of the rationale behind a prediction. In this study, LIME is deployed for the STACK1 classifier. illustrates two cases, a positive and negative prediction. The features represented with orange positively contribute to the GDM-positive prediction. Feature blue negatively contributed to a GDM prediction. The length of the bar depicts the contribution of the feature. Plot 9(a) demonstrates that factors such as high fasting glucose level, BMI, high visceral fat levels and multiple pregnancies contributed to a 0.99 probability of this patient being predicted as GDM-positive.

Figure 9. LIME plots: (a) local prediction for a patient predicted positive, (b) local prediction for a patient predicted negative.

Figure 9. LIME plots: (a) local prediction for a patient predicted positive, (b) local prediction for a patient predicted negative.

Furthermore, plot 9(b) indicates that most features in blue have contributed to the negative outcome of 0. Features like first fasting glucose, BMI and mean diastole of the patient are in the normal range. Hence, the probability of the patient being classified as negative is 0.97. These findings reflect the literature wherein multiple parameters contribute to a GDM prediction (Alwash et al., Citation2021).

3.2.3. ELI5

This Python package is another XAI tool that helps classifier interpretability and debugging. ELI5 stands for ‘Explain Like I am 5’. This model can provide explanations for widely used tree-based models. represents three plots obtained from ELI5 explaining the design and predictions of the DT classifier. The DT provided an accuracy of 93% and an F1 score of 0.92. Plot 10(a) is a table indicating which features were assigned the most weightage for creating a DT. Based on their decreasing weightage, the features are arranged top to bottom in descending order. The topmost feature is the ‘child’s birthweight (g)’. This feature has the highest weight assignment and is considered the root node while constructing the tree. This choice of root node indicates the strong relation between GDM and the Child’s birthweight. Plot 10(b) represents the DT. This in-depth view of a classifier helps debug the model and conduct medical validation. This DT has seven levels. Each node is split based on the Gini index criterion. Both plots 10(a) and 10(b) provided a global explanation for the ML model. However, plot 10(c) depicts a local prediction for a data point categorized as GDM-positive. The values of the most significant features and their contribution to the prediction are tabulated. The high values of maternal BMI, visceral fat and multiple pregnancies have contributed to GDM (Liu et al., Citation2020). The addition of all individual contributions led to a prediction of 1.

Figure 10. ELI5 plots: (a) the features and weight assigned to individual features for building a tree [features x0, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11 and x12 corresponds to age (years), ethnicity, diabetes mellitus, mean diastolic BP (mmHg), mean systolic bp (mmHg), central armellini fat (mm), current gestational age, pregnancies (number), first fasting glucose (mg/dl), BMI pregestational (kg/m), gestational age at birth, type of delivery and child’s birthweight (g), respectively]; (b) a decision tree with multiple conditional nodes split based on Gini index and, (c) a local prediction for a single patient that has been predicted GDM-positive.

Figure 10. ELI5 plots: (a) the features and weight assigned to individual features for building a tree [features x0, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11 and x12 corresponds to age (years), ethnicity, diabetes mellitus, mean diastolic BP (mmHg), mean systolic bp (mmHg), central armellini fat (mm), current gestational age, pregnancies (number), first fasting glucose (mg/dl), BMI pregestational (kg/m), gestational age at birth, type of delivery and child’s birthweight (g), respectively]; (b) a decision tree with multiple conditional nodes split based on Gini index and, (c) a local prediction for a single patient that has been predicted GDM-positive.

3.2.4. Anchor

There tends to be a constant trade-off between model complexity and interpretability. Anchor is a high-precision explainer of an XAI tool proposed by Ribeiro et al. (Citation2018) that can explain complex models. This technique generates specific feature conditions or rules known as anchors to interpret an observation and the observations around it precisely. This technique provides better generalizability than LIME. The Anchor knowledge can also be applied to the neighboring points of the observation of interest, defined by the coverage. In this study, Anchor explained the prediction made by RF. contains the outputs for 10 patients, of which five were predicted negative and five GDM-positive. The first patient was predicted negative with an explanation that the child’s birthweight was less than or equal to 3045 g. The second feature was that the patient had less than or equal to 80 mg/dl for the first fasting glucose. These conditions were associated with a normal patient or GDM-negative with a precision of 99% and coverage of 19%. Similar observations can be seen for the remaining negative patients, wherein the child’s birthweight, pregnancy number and fasting glucose are used in making conditional anchors.

Table 5. Anchor prediction explanations.

Furthermore, the sixth to tenth rows of depict anchor explanations for five positive patients. In the sixth row, a higher value of first fasting glucose above 90 mg/dl and central armellini fat above 62.61 mm are anchors produced to explain the positive prediction made by the RF with a precision of 99% and 10% coverage. Higher values above certain thresholds for first fasting glucose, BMI and central visceral fat are associated with a positive GDM. The anchors are generated to align with GDM research (Alwash et al., Citation2021; Sesmilo et al., Citation2020).

3.2.5. Quantum lattice (Qlattice)

Qlattice is a supervised ML algorithm that was invented by Abzu, inspired by Richard Fenyman’s path. This model takes the data and can provide an entirely different fit to the data (Broløs et al., Citation2021). Similar to neural networks, the model is constantly updated to provide the best fit. Qlattice creates and searches among numerous potential models for the appropriate data fit. Furthermore, it aims to consider the correct number of features and analyses their interactions to produce an outcome. The high-level design of the algorithm can be visualized by a Qgraph, as observed in . This Qlattice model has given more significance to features like the number of pregnancies and the maternal first fasting glucose used to obtain the output. EquationEquation (9) illustrates a mathematical equation that can be used to interpret the model. When comparing this method to the previously deployed XAI tools, it can be noted that Qlattice gives importance to features such as number of pregnancies and glucose levels for obtaining a prediction of GDM (Alwash et al., Citation2021). (9) Qlattice equation=logreg(0.143first fasting glucose13.020.4e17pregnancies(number))(9)

Figure 11. Qgraph for the Qlattice prediction model.

Figure 11. Qgraph for the Qlattice prediction model.

3.2.6. Feature importance

Feature importance is a method of estimating scores for all input features based on their effect on the prediction. A higher score implies a more significant effect on the classification made. consists of feature importances obtained by tree-based models such as DT, RF, AdaBoost, Extratrees, Light GBM, XGBoost and CatBoost. Furthermore, these plots are compared with STACK’s SHAP feature contribution plot. Most plots indicate the essential feature to be the first fasting glucose. This can be mapped with the GDM literature, as the glucose level is a critical parameter to consider when diagnosing GDM (Perna et al., Citation2015). The LIME plots for STACK give similar observations. Considering ELI5 produced for the DT and the feature importance graph of DT weight of the child at birth, the visceral fat and mean diastolic value are a few features seen in both the XAI tools. However, in the feature importance graph of the decision, a tree could analyze the glucose level as the most significant feature missing in the ELI5 explanation. Comparing the Anchor results with the RF feature importance plot, the top five features in the plot are fasting glucose, visceral fat deposit, mean diastole, BMI and weight of the child at birth. All these features were considered anchors to explain both positive and negative predictions, indicating the validity of the anchor. Furthermore, the CatBoost classifier emphasized visceral fat and the number of pregnancies. SHAP feature contribution plot has assigned higher importance to the child’s weight at birth, pregnancy number and central fat deposit. We have compared the results of these feature importance plots with the previously deployed SHAP, LIME, ELI5, Anchor and Qlattice architectures. It was seen that most plots align with the research, indicating the association of Maternal fasting glucose levels, BMI, Visceral fat levels, the weight of the child on birth, the number of pregnancies and blood pressure with GDM. This study observed that the mother’s ethnicity and the delivery type do not significantly increase the probability of a particular prediction.

Figure 12. Feature importance plots: (a) Decision Tree, (b) Random Forest, (c) AdaBoost, (d) Extratrees, (e) Light GBM, (f) XGBoost, (g) CatBoost, (h) SHAP feature contribution plot for STACK.

Figure 12. Feature importance plots: (a) Decision Tree, (b) Random Forest, (c) AdaBoost, (d) Extratrees, (e) Light GBM, (f) XGBoost, (g) CatBoost, (h) SHAP feature contribution plot for STACK.

3.3. Discussions

In this study, we compared the performance of different classifiers when trained on five different balancing techniques. Furthermore, various ensemble classifiers were deployed to provide an elaborative comparison of the widely implemented ML pipelines. Five XAI techniques, such as SHAP, LIME, ELI5, Qlattice and Anchor, explained some of the best-performing classifiers. Furthermore, we compared and validated outcomes of various explainable AI techniques with seven tree-based feature importance techniques.

Multiple traditional features for detecting gestational diabetes were considered. First fasting glucose, BMI, blood pressure variables, number of pregnancies, previous history of diabetes and maternal age are primary features considered in multiple GDM detection research (Mennickent et al., Citation2022). Studies suggest that GDM causes the detection of high levels of first fasting glucose (McIntyre et al., Citation2019). The hormonal changes can increase the mother’s BMI and blood pressure, causing the risk of pre-eclampsia (Damm et al., Citation2016). On an extensive medical literature review, multiple articles agreed with the increased risk of GDM with more pregnancies (Egan et al., Citation2021). The SHAP waterfall, LIME and Qlattice plots align with the above research, indicating the significance of these primary features. However, it was discovered that the most critical features second to the first fasting glucose were the visceral fat levels and the child’s birthweight. This strong association was observed in all the SHAP, ELI5 and Anchor interpretations. These were two unconventional features absent in recent GDM prognoses and diagnosis studies. However, the importance of these two features was validated by all seven tree-based feature importance plots. It is observed that GDM is linked with a higher amount of visceral fat, which aligns with medical research (Alwash et al., Citation2021). This feature could assist in detecting GDM during pregnancy. When considering the strong association between a child’s birthweight and GDM, this parameter may not be beneficial to detect GDM during pregnancy. However, it does suggest the mother’s and the child’s health. Medical follow-ups and postnatal care for such mothers are critical as studies suggest that even after childbirth, GDM patients are at high risk of developing chronic diabetes mellitus (Damm et al., Citation2016).

Previously, multiple studies have been conducted to detect the presence of GDM. These studies deployed different datasets and obtained varying performances. Xiong et al. (Citation2022) demonstrated the effect of patient biomarkers on the prediction of GDM. This study highlighted that Thrombin levels contributed the most toward accurately detecting a GDM patient. Their SVM pipeline produced the highest AUC score of 0.98. Du et al. (Citation2022) conducted SMOTE for the data balancing, then trained five classifiers. SVM, among all others, produced the highest performance with an accuracy of 76.1%. Blood biomarkers and demographic parameters were considered. SHAP force plots and feature contribution plots added a layer of explainability to the clinical decision support system. In research conducted by Gnanadass (Citation2020), various features related to blood glucose, blood pressure and body weight were studied. Among eight evaluated pipelines, XGBoost had the highest accuracy of 77.4%. Liu et al. (Citation2021) feature importance with XGBoost assisted in identifying significant features. XGBoost classifier outperformed the logistic model, producing a higher AUC score of 0.742. Features like pre-pregnancy BMI, maternal age and fasting glucose significantly contributed to the GDM prediction. Bhat et al. (Citation2022) proposed a study on predicting type-2 diabetes. The dataset used contained 210 data samples and nine attributes. In this study, they performed predictions with a combination of RF, DT and LR, which produced an accuracy of 99.34%. Shafi and Ansari (Citation2021) published a study consisting of ML techniques to classify diabetes samples and provide an early prediction. They used SVM, Naïve Bayes and DT on a Pima Indian Dataset, obtaining an accuracy of 74.28%. A study by Wu et al. (Citation2021) provided an Early prediction of gestational diabetes among a Chinese population. The dataset had 16,819 samples, out of which 14,992 had gestational diabetes, where they considered 72 variables. A Deep Neural Network model was constructed, achieving an AUC score of 0.80. They explored the impact of thyroxine and BMI on the prediction of GDM.

4. Conclusion and future scope

GDM is a condition wherein hyperglycemia is diagnosed during pregnancy. Insulin resistance causes elevated glucose levels. The patient is at high risk of suffering from factors such as high blood pressure, pre-eclampsia and chronic type-2 diabetes post-pregnancy. Maternal GDM can risk fetal development. Understanding the impact factors associated with GDM can enhance detection and knowledge regarding this disorder. In this study, we have compared and evaluated multiple ML pipelines and analyzed the effect of various data balancing techniques. A stacking model representing an ensemble of all deployed ML models trained on SMOTE-ENN data gave the highest performance with accuracy, F1 score and AUC score of 96%, 0.97 and 0.96, respectively. Furthermore, the best-performing models were interpreted with five explainable AI techniques: SHAP, LIME, ELI5, Qlattice and Anchor. The XAI models assisted validation of association of Visceral Fat levels and the child’s weight after birth with GDM.

A graphical user interface (GUI) for this study could assist in screening GDM patients in real time. The proposed frameworks could be scaled to a larger population with more parameters. This GUI could be integrated into Obstetrics and Gynecology Electronic Medical Records for automated screening. However, intensive medical validation and testing should be conducted with a larger and high-quality dataset before deploying these frameworks in real time. When used in medical facilities, the advancements in XAI can assist in bridging the gap between the medical and informatics domains.

Author contributions

V.V.K. performed the experiments and wrote this manuscript. K.C. and N.S. provided the design ideas. S.P. performed model development. R.C.P., D.B. and S.K.S. validated the results.

Open access

Open access funding will be provided by Manipal Academy of Higher Education.

Acknowledgements

We would like to thank the Manipal Academy of Higher Education for providing us an opportunity to conduct this research.

Disclosure statement

The authors declare no competing interest.

Data availability statement

The dataset is available using the following link. https://www.kaggle.com/datasets/kamyababedi/visceral-adipose.

References

  • Alwash, S. M., McIntyre, H. D., & Mamun, A. (2021). The association of general obesity, central obesity and visceral body fat with the risk of gestational diabetes mellitus: Evidence from a systematic review and meta-analysis. Obesity Research & Clinical Practice, 15(5), 425–430. https://doi.org/10.1016/j.orcp.2021.07.005
  • Anguita, D., Ghelardoni, L., Ghio, A., Oneto, L., & Ridella, S. (2012, April 25–27). The ‘K’ in K-fold cross validation. In ESANN 2012 Proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium (pp. 441–446). i6doc Publishers.
  • Anguita, D., Ghio, a., Greco, N., Oneto, L., & Ridella, S. (2010, July 18–23). Model selection for support vector machines: Advantages and disadvantages of the machine learning theory. In 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain (pp. 1–8). IEEE. https://doi.org/10.1109/Ijcnn.2010.5596450
  • Bhat, S. S., Selvam, V., Ansari, G. A., & Ansari, M. D. (2022, November 25–27). Hybrid prediction model for type-2 diabetes mellitus using machine learning approach. In 2022 Seventh International Conference on Parallel, Distributed and Grid Computing (PDGC), Solan, Himachal Pradesh, India (pp. 150–155). IEEE. https://doi.org/10.1109/PDGC56933.2022.10053092
  • Broløs, K. R., Machado, M. V., Cave, C., Kasak, J., Stentoft-Hansen, V., Batanero, V. G., Jelen, T., & Wilstrup, C. (2021). An approach to symbolic regression using Feyn [arXiv preprint arXiv:2104.05417]. https://doi.org/10.48550/arXiv.2104.05417
  • Butucea, C., Ndaoud, M., Stepanova, N. A., & Tsybakov, A. B. (2018). Variable selection with Hamming loss. Annals of Statistics, 46(5), 1837–1875.
  • Chadaga, K., Prabhu, S., Bhat, V., Sampathila, N., Umakanth, S., & Chadaga, R. (2023). A decision support system for diagnosis of COVID-19 from non-COVID-19 influenza-like illness using explainable artificial intelligence. Bioengineering, 10(4), 439. https://doi.org/10.3390/bioengineering10040439
  • Chan, Y. N., Wang, P., Chun, K. H., Lum, J. T. S., Wang, H., Zhang, Y., & Leung, K. S. Y. (2023). A machine learning approach for early prediction of gestational diabetes mellitus using elemental contents in fingernails. Scientific Reports, 13(1), 4184. https://doi.org/10.1038/s41598-023-31270-y
  • Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1), 6. https://doi.org/10.1186/s12864-019-6413-7
  • Damm, P., Houshmand-Oeregaard, A., Kelstrup, L., Lauenborg, J., Mathiesen, E. R., & Clausen, T. D. (2016). Gestational diabetes mellitus and long-term consequences for mother and offspring: A view from Denmark. Diabetologia, 59(7), 1396–1399. https://doi.org/10.1007/s00125-016-3985-5
  • de Oliveira, G. P., Fonseca, A., & Rodrigues, P. C. (2022). Diabetes diagnosis based on hard and soft voting classifiers combining statistical learning models. Brazilian Journal of Biometrics, 40(4), 415–427. https://doi.org/10.28951/bjb.v40i4.605
  • Du, Y., Rafferty, A. R., McAuliffe, F. M., Wei, L., & Mooney, C. (2022). An explainable machine learning-based clinical decision support system for prediction of gestational diabetes mellitus. Scientific Reports, 12(1), 1170. https://doi.org/10.1038/s41598-022-05112-2
  • Egan, A. M., Enninga, E. A. L., Alrahmani, L., Weaver, A. L., Sarras, M. P., & Ruano, R. (2021). Recurrent gestational diabetes mellitus: A narrative review and single-center experience. Journal of Clinical Medicine, 10(4), 569. https://doi.org/10.3390/jcm10040569
  • Ferreira, P., Le, D. C., & Zincir-Heywood, N. (2019, October 21–25). Exploring feature normalization and temporal information for machine learning based insider threat detection. In 2019 15th International Conference on Network and Service Management (CNSM), Halifax, NS, Canada (pp. 1–7). IEEE. https://doi.org/10.23919/CNSM46954.2019.9012708
  • Gnanadass, I. (2020). Prediction of gestational diabetes by machine learning algorithms. IEEE Potentials, 39(6), 32–37. https://doi.org/10.1109/MPOT.2020.3015190
  • Gong, L., Jiang, S., & Jiang, L. (2019). Tackling class imbalance problem in software defect prediction through cluster-based over-sampling with filtering. IEEE Access, 7, 145725–145737. https://doi.org/10.1109/ACCESS.2019.2945858
  • Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In D. S. Huang, X. P. Zhang, & G. B. Huang (Eds), Advances in Intelligent Computing. ICIC 2005. Lecture Notes in Computer Science (Vol. 3644, pp. 878–887). Springer.
  • Jonathan, B., Putra, P. H., & Ruldeviyani, Y. (2020, July 7–8). Observation imbalanced data text to predict users selling products on female daily with SMOTE, Tomek, and SMOTE-Tomek. In 2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), Bali, Indonesia (pp. 81–85). IEEE. https://doi.org/10.1109/IAICT50021.2020.9172033
  • Junker, M., Hoch, R., & Dengel, A. (1999, September 22). On the evaluation of document analysis components by recall, precision, and accuracy. In Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR'99 (Cat. No. PR00318), Bangalore, India (pp. 713–716). IEEE.
  • Kaggle. (2023). Visceral adipose tissue measurements during pregnancy. https://www.kaggle.com/datasets/kamyababedi/visceral-adipose
  • Kaur, P., Sharma, M., & Mittal, M. (2018). Big data and machine learning based secure healthcare framework. Procedia Computer Science, 132, 1049–1059. https://doi.org/10.1016/j.procs.2018.05.020
  • Kaushik, B., Sharma, R., Dhama, K., Chadha, A., & Sharma, S. (2023). Performance evaluation of learning models for intrusion detection system using feature selection. Journal of Computer Virology and Hacking Techniques, 19(4), 529–548. https://doi.org/10.1007/s11416-022-00460-z
  • Khanna, V. V., Chadaga, K., Sampathila, N., Chadaga, R., Prabhu, S., K S, S., Jagdale, A. S., & Bhat, D. (2023). A decision support system for osteoporosis risk prediction using machine learning and explainable artificial intelligence. Heliyon, 9(12), e22456. https://doi.org/10.1016/j.heliyon.2023.e22456
  • Khanna, V. V., Chadaga, K., Sampathila, N., Prabhu, S., & Chadaga, R. (2023). A machine learning and explainable artificial intelligence triage-prediction system for COVID-19. Decision Analytics Journal, 7, 100246. https://doi.org/10.1016/j.dajour.2023.100246
  • Koto, F. (2014, October 18–19). SMOTE-Out, SMOTE-Cosine, and selected-SMOTE: An enhancement strategy to handle imbalance in data level. In 2014 International Conference on Advanced Computer Science and Information System, Jakarta, Indonesia (pp. 280–284). IEEE.
  • Liu, B., Song, L., Zhang, L., Wang, L., Wu, M., Xu, S., Cao, Z., & Wang, Y. (2020). Higher numbers of pregnancies associated with an increased prevalence of gestational diabetes mellitus: Results from the Healthy Baby Cohort Study. Journal of Epidemiology, 30(5), 208–212. https://doi.org/10.2188/jea.JE20180245
  • Liu, H., Li, J., Leng, J., Wang, H., Liu, J., Li, W., Liu, H., Wang, S., Ma, J., Chan, J. C., Yu, Z., Hu, G., Li, C., & Yang, X. (2021). Machine learning risk score for prediction of gestational diabetes in early pregnancy in Tianjin, China. Diabetes/Metabolism Research and Reviews, 37(5), e3397. https://doi.org/10.1002/dmrr.3397
  • Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.) Advances in neural information processing systems (Vol. 30, pp. 4765–4774). Curran Associates.
  • McIntyre, H. D., Catalano, P., Zhang, C., Desoye, G., Mathiesen, E. R., & Damm, P. (2019). Gestational diabetes mellitus. Nature Reviews. Disease Primers, 5(1), 47. https://doi.org/10.1038/s41572-019-0098-8
  • Mennickent, D., Rodríguez, A., Farías-Jofré, M., Araya, J., & Guzmán-Gutiérrez, E. (2022). Machine learning-based models for gestational diabetes mellitus prediction before 24–28 weeks of pregnancy: A review. Artificial Intelligence in Medicine, 132, 102378. https://doi.org/10.1016/j.artmed.2022.102378
  • Muntasir Nishat, M., Faisal, F., Jahan Ratul, I., Al-Monsur, A., Ar-Rafi, A. M., Nasrullah, S. M., Reza, M. T., & Khan, M. R. H. (2022). A comprehensive investigation of the performances of different machine learning classifiers with SMOTE-ENN oversampling technique and hyperparameter optimization for imbalanced heart failure dataset. Scientific Programming, 2022, 1–17. https://doi.org/10.1155/2022/3649406
  • Nasarian, E., Abdar, M., Fahami, M. A., Alizadehsani, R., Hussain, S., Basiri, M. E., Zomorodi-Moghadam, M., Zhou, X., Pławiak, P., Acharya, U. R., Tan, R.-S., & Sarrafzadegan, N. (2020). Association between work-related features and coronary artery disease: A heterogeneous hybrid feature selection integrated with balancing approach. Pattern Recognition Letters, 133, 33–40. https://doi.org/10.1016/j.patrec.2020.02.010
  • No, A. (2019). Universality of logarithmic loss in fixed-length lossy compression. Entropy, 21(6), 580. https://doi.org/10.3390/e21060580
  • Peng, C., & Cheng, Q. (2020). Discriminative ridge machine: A classifier for high-dimensional data or imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 32(6), 2595–2609. https://doi.org/10.1109/TNNLS.2020.3006877
  • Perna, R., Loughan, A. R., Le, J., & Tyson, K. (2015). Gestational diabetes: Long-term central nervous system developmental and cognitive sequelae. Applied Neuropsychology. Child, 4(3), 217–220. https://doi.org/10.1080/21622965.2013.874951
  • Ranjan, G. S. K., Verma, A. K., & Radhika, S. (2019, March 29–31). K-nearest neighbors and grid search cv based real time fault monitoring system for industries. In 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), Bombay, India (pp. 1–5). IEEE. https://doi.org/10.1109/I2CT45611.2019.9033691
  • Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. In B. Krishnapuram & M. Shah (Eds.), KDD'16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135–1144). Association for Computing Machinery.
  • Ribeiro, M. T., Singh, S., & Guestrin, C. (2018). Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 32, No. 1). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v32i1.11491
  • Rodrigues, T., & Bernardes, G. J. (2020). Machine learning for target discovery in drug development. Current Opinion in Chemical Biology, 56, 16–22. https://doi.org/10.1016/j.cbpa.2019.10.003
  • Santosh, K. C., & Gaur, L. (2021). Introduction to AI in public health. In K. C. Santosh & L. Gaur (Eds.), Artificial intelligence and machine learning in public healthcare: Opportunities and societal impact (pp. 1–10). Springer.
  • Sesmilo, G., Prats, P., Garcia, S., Rodríguez, I., Rodríguez-Melcón, A., Berges, I., & Serra, B. (2020). First-trimester fasting glycemia as a predictor of gestational diabetes (GDM) and adverse pregnancy outcomes. Acta Diabetologica, 57(6), 697–703. https://doi.org/10.1007/s00592-019-01474-8
  • Shaat, N., & Groop, L. (2007). Genetics of gestational diabetes mellitus. Current Medicinal Chemistry, 14(5), 569–583. https://doi.org/10.2174/092986707780059643
  • Shafi, S., & Ansari, G. A. (2021, April 29–30). Early prediction of diabetes disease & classification of algorithms using machine learning approach [Paper presentation]. Proceedings of the International Conference on Smart Data Intelligence (ICSMDI 2021), Trichy, Tamil Nadu, India.
  • Thekadayil, A. J. (2018). Product bundle recommendation using collaboration of matrix factorization and Jaccard similarity [Doctoral dissertation]. National College of Ireland.
  • Wu, Y.-T., Zhang, C.-J., Mol, B. W., Kawai, A., Li, C., Chen, L., Wang, Y., Sheng, J.-Z., Fan, J.-X., Shi, Y., & Huang, H.-F. (2021). Early prediction of gestational diabetes mellitus in the Chinese population via advanced machine learning. The Journal of Clinical Endocrinology and Metabolism, 106(3), e1191–e1205. https://doi.org/10.1210/clinem/dgaa899
  • Xiong, Y., Lin, L., Chen, Y., Salerno, S., Li, Y., Zeng, X., & Li, H. (2022). Prediction of gestational diabetes mellitus in the first 19 weeks of pregnancy using machine learning techniques. The Journal of Maternal-Fetal & Neonatal Medicine, 35(13), 2457–2463. https://doi.org/10.1080/14767058.2020.1786517
  • Zhang, Z., Yang, L., Han, W., Wu, Y., Zhang, L., Gao, C., Jiang, K., Liu, Y., & Wu, H. (2022). Machine learning prediction models for gestational diabetes mellitus: Meta-analysis. Journal of Medical Internet Research, 24(3), e26634. https://doi.org/10.2196/26634