448
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Critical Factor Analysis for prediction of Diabetes Mellitus using an Inclusive Feature Selection Strategy

& ORCID Icon
Article: 2331919 | Received 19 Aug 2023, Accepted 09 Mar 2024, Published online: 01 Apr 2024

References

  • 10 Surprising Things That Can Spike Your Blood Sugar | CDC. Accessed May 24, 2023. [Online]. Available: https://www.cdc.gov/diabetes/library/spotlights/blood-sugar.html
  • 7th edition | IDF Diabetes Atlas. Accessed Dec 11, 2023. [Online]. Available: https://diabetesatlas.org/atlas/seventh-edition/
  • Aha, D. W., D. Kibler, M. K. Albert, and J. R. Quinian. Jan, 1991. Instance-based learning algorithms. Machine Learning 6 (1):37–31. doi:10.1007/BF00153759.
  • Alalwan, S. A. D. Apr, 2019. Diabetic analytics: Proposed conceptual data mining approaches in type 2 diabetes dataset. Indonesian Journal of Electrical Engineering and Computer Science 14 (1):88–95. doi:10.11591/IJEECS.V14.I1.PP88-95.
  • Alam, S., M. K. Hasan, S. Neaz, N. Hussain, M. F. Hossain, and T. Rahman. Apr, 2021. Diabetes Mellitus: Insights from Epidemiology, Biochemistry, Risk Factors, Diagnosis, Complications and Comprehensive Management. Diabetology 2021 2(2):36–50. doi:10.3390/DIABETOLOGY2020004.
  • Ali, N., D. Neagu, and P. Trundle. Dec, 2019. Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets. SN Applied Sciences 1(12):1–15. doi:10.1007/s42452-019-1356-9.
  • Alyasiri, O. M., Y. N. Cheah, A. K. Abasi, and O. M. Al-Janabi. 2022. Wrapper and hybrid feature selection methods using metaheuristic algorithms for English Text Classification: A systematic review. IEEE Access 10:39833–52. doi:10.1109/ACCESS.2022.3165814.
  • Ang, J. C., A. Mirzal, H. Haron, and H. N. A. Hamed. Sep, 2016. Supervised, unsupervised, and semi-supervised feature selection: A review on gene selection. IEEE/ACM Transactions on Computational Biology & Bioinformatics / IEEE, ACM 13 (5):971–89. doi:10.1109/TCBB.2015.2478454.
  • Bach, M., A. Werner, and M. Palt. 2019. The proposal of undersampling method for learning from imbalanced datasets. Procedia Computer Science 159 (Jan):125–34. doi:10.1016/J.PROCS.2019.09.167.
  • Bahl, A., Hellack, B., Balas, M., Dinischiotu, A., Wiemann, M., Brinkmann, J., Luch, A., Renard, B. Y., Haase, A. Mar, 2019. Recursive feature elimination in random forest classification supports nanomaterial grouping. NanoImpact 15:100179. doi: 10.1016/J.IMPACT.2019.100179.
  • Baker, S., and R. D. Cousins. Apr 1984. Clarification of the use of CHI-square and likelihood functions in fits to histograms. Nuclear Instruments and Methods in Physics Research 221 (2):437–42. doi:10.1016/0167-5087(84)90016-4.
  • Bommert, A., X. Sun, B. Bischl, J. Rahnenführer, and M. Lang. 2020. Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis 143 (Mar):106839. doi:10.1016/J.CSDA.2019.106839.
  • Browne, M. W. Mar 2000. Cross-validation methods. Journal of Mathematical Psychology 44 (1):108–32. doi:10.1006/JMPS.1999.1279.
  • Buyrukoğlu, S., and A. Akbaş. Apr 2022. Machine learning based early prediction of type 2 diabetes: A new hybrid feature selection approach using correlation matrix with heatmap and SFS. Balkan Journal of Electrical and Computer Engineering 10 (2):110–17. doi:10.17694/BAJECE.973129.
  • Chang, V., J. Bailey, Q. A. Xu, and Z. Sun. Aug, 2023. Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Computing & Applications 35(22):16157–73. doi: 10.1007/s00521-022-07049-z.
  • Chart: Where Diabetes Burdens Are Rising | Statista. Accessed May 24, 2023. [Online]. Available: https://www.statista.com/chart/23491/share-of-adults-with-diabetes-world-region/
  • Chatrati, S. P., G. Hossain, A. Goyal, A. Bhan, S. Bhattacharya, D. Gaurav, and S. M. Tiwari. Mar, 2022. Smart home health monitoring system for predicting type 2 diabetes and hypertension. Journal of King Saud University - Computer and Information Sciences 34 (3):862–70. doi:10.1016/J.JKSUCI.2020.01.010.
  • Chen, R. C., C. Dewi, S. W. Huang, and R. E. Caraka. Dec, 2020. Selecting critical features for data classification based on machine learning methods. Journal of Big Data 7(1):1–26. doi:10.1186/s40537-020-00327-4.
  • Choubey, D. K., P. Kumar, S. Tripathi, and S. Kumar. Dec, 2020. Performance evaluation of classification methods with PCA and PSO for diabetes. Network Modeling Analysis in Health Informatics and Bioinformatics 9(1):1–30. doi:10.1007/s13721-019-0210-8.
  • Chowdary, P. B. K., and R. U. Kumar. 2021. Diabetes Classification using an Expert Neuro-fuzzy Feature Extraction Model. International Journal of Advanced Computer Science and Applications 12 (8):368–74. doi:10.14569/IJACSA.2021.0120842.
  • Dalianis, H. 2018. Evaluation Metrics and Evaluation. Clinical Text Mining 45–53. doi:10.1007/978-3-319-78503-5_6.
  • De Silva, K., D. Jönsson, and R. T. Demmer. Mar, 2020. A combined strategy of feature selection and machine learning to identify predictors of prediabetes. Journal of the American Medical Informatics Association: JAMIA 27 (3):396–406. doi:10.1093/JAMIA/OCZ204.
  • Diabetes Prevalence Expected to Double Globally by 2050. 2023. Accessed Dec 13, 2023. [Online]. Available: https://www.ajmc.com/view/diabetes-prevalence-expected-to-double-globally-by-2050
  • Di Franco, A. 2019. Information-gain computation in the fifth system. International Journal of Approximate Reasoning 105 (Feb):386–95. doi:10.1016/J.IJAR.2018.11.013.
  • Doğru, A., S. Buyrukoğlu, and M. Arı. Mar, 2023. A hybrid super ensemble learning model for the early-stage prediction of diabetes risk. Medical & Biological Engineering & Computing 61(3):785–97. doi:10.1007/s11517-022-02749-z.
  • Fei, Z., F. Yang, K. L. Tsui, L. Li, and Z. Zhang. 2021. Early prediction of battery lifetime via a machine learning based framework. Energy 225 (Jun):120205. doi:10.1016/J.ENERGY.2021.120205.
  • Freund, R. J., W. J. Wilson, and D. L. Mohr. 2010. Nonparametric methods. Statistical Methods 689–719. doi:10.1016/B978-0-12-374970-3.00014-7.
  • Gromova, L. V., S. O. Fetissov, and A. A. Gruzdkov. Jul, 2021. Mechanisms of glucose absorption in the small intestine in health and metabolic diseases and their role in appetite regulation. Nutrients 13(7). doi:10.3390/NU13072474.
  • Guo, G., H. Wang, D. Bell, Y. Bi, and K. Greer. 2003. KNN model-based approach in classification. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2888:986–96. doi:10.1007/978-3-540-39964-3_62/COVER/.
  • Gupta, S. C., and N. Goel. 2023. Predictive modeling and analytics for diabetes using hyperparameter tuned machine learning techniques. Procedia Computer Science 218:1257–69. doi:10.1016/J.PROCS.2023.01.104.
  • Gupta, A., I. S. Rajput, Gunjan, V. Jain, and S. Chaurasia. Sep, 2022. NSGA-II-XGB: Meta-heuristic feature selection with XGBoost framework for diabetes prediction. Concurrency & Computation: Practice & Experience 34 (21):e7123. doi:10.1002/CPE.7123.
  • Gürsoy, M. İ., and A. Alkan. Dec, 2022. Investigation of diabetes data with permutation feature importance based deep learning methods. Karadeniz Fen Bilimleri Dergisi 12 (2):916–30. doi:10.31466/KFBD.1174591.
  • Gutkin, M., R. Shamir, and G. Dror. Jul, 2009. SlimPLS: A method for feature selection in gene expression-based disease classification. PLoS One 4(7):e6416. doi:10.1371/JOURNAL.PONE.0006416.
  • Hou, J., Y. Sang, Y. Liu, and L. Lu, “Feature selection and prediction Model for type 2 diabetes in the Chinese Population with machine learning,” ACM International Conference Proceeding Series, Oct. 2020, doi:10.1145/3424978.3425085.
  • Hsu, H. H., C. W. Hsieh, and M. Da Lu. Jul, 2011. Hybrid feature selection by combining filters and wrappers. Expert Systems with Applications 38 (7):8144–50. doi:10.1016/J.ESWA.2010.12.156.
  • Huerta, E. B., R. M. Caporal, M. A. Arjona, and J. C. H. Hernández. 2013. Recursive feature elimination based on linear discriminant analysis for molecular selection and classification of diseases. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7996:244–51. doi:10.1007/978-3-642-39482-9_28/COVER/.
  • Hu, M., and F. Wu. 2010. Filter-wrapper hybrid method on feature selection. Proceedings - 2010 2nd WRI Global Congress on Intelligent Systems, GCIS 2010, 3:98–101. doi:10.1109/GCIS.2010.235.
  • Jain, S., and A. Saha. Mar, 2022. Rank-based univariate feature selection methods on machine learning classifiers for code smell detection. Evolutionary Intelligence 15(1):609–38. doi:10.1007/s12065-020-00536-z.
  • Joshi, R. D., and C. K. Dhakal. Jul, 2021. Predicting type 2 diabetes using logistic regression and machine learning approaches. International Journal of Environmental Research and Public Health 18(14). doi:10.3390/IJERPH18147346.
  • Juneja, A., S. Juneja, S. Kaur, and V. Kumar. 2021. Predicting Diabetes Mellitus With Machine Learning Techniques Using Multi-Criteria Decision Making. International Journal of Information Retrieval Research 11 (2):38–52. doi:10.4018/IJIRR.2021040103.
  • Kakoly, I. J., M. R. Hoque, and N. Hasan. 2023. Data-driven diabetes risk factor prediction using machine learning algorithms with feature selection technique. Sustainability 15 (6):4930. doi:10.3390/SU15064930.
  • Kira, K., and L. A. Rendell. 1992. A practical approach to feature selection. Machine Learning Proceedings 1992 (Jan):249–56. doi:10.1016/B978-1-55860-247-2.50037-1.
  • Kishor, A., and C. Chakraborty. Jun, 2021. Early and accurate prediction of diabetics based on FCBF feature selection and SMOTE. International Journal of Systems Assurance Engineering and Management 1–9. doi:10.1007/s13198-021-01174-z.
  • Kulkarni, A., D. Chong, and F. A. Batarseh. Jan, 2020. Foundations of data imbalance and solutions for a data democracy. Data Democracy: At the Nexus of Artificial Intelligence, Software Development, and Knowledge Engineering 83–106. doi:10.1016/B978-0-12-818366-3.00005-8.
  • Kumari, S., D. Kumar, and M. Mittal. 2021. An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. International Journal of Cognitive Computing in Engineering 2 (Jun):40–46. doi:10.1016/J.IJCCE.2021.01.001.
  • Li T, and Fong S. Nov, 2019. A fast feature selection method based on coefficient of variation for diabetics prediction using machine learning. International Journal of Extreme Automation and Connectivity in Healthcare (IJEACH) 1(1):55–65. doi:10.4018/IJEACH.2019010106.
  • Liu, Y., J. M. Wu, M. Avdeev, and S. Q. Shi. Feb, 2020. Multi-layer feature selection incorporating weighted score-based expert knowledge toward modeling materials with targeted properties. Advanced Theory and Simulations 3(2):1900215. doi:10.1002/ADTS.201900215.
  • Madurapperumage, A., W. Y. C. Wang, and M. Michael. 2021. A systematic review on extracting predictors for forecasting complications of diabetes mellitus. In ACM International Conference Proceeding Series, May, 327–30. doi:10.1145/3472813.3473211.
  • Masoudi-Sobhanzadeh, Y., H. Motieghader, and A. Masoudi-Nejad. Apr, 2019. FeatureSelect: A software for feature selection based on machine learning approaches. BMC Bioinformatics 20(1):1–17. doi:10.1186/s12859-019-2754-0.
  • Mishra, S., H. K. Tripathy, P. K. Mallick, A. K. Bhoi, and P. Barsocchi. 2020. EAGA-MLP—an enhanced and adaptive hybrid classification Model for diabetes diagnosis. Sensors 20 (14):4036. doi:10.3390/S20144036.
  • Mucherino, A., P. J. Papajorgji, and P. M. Pardalos. 2009. Nearest neighbor classification. 83–106. doi:10.1007/978-0-387-88615-2_4.
  • Nagaraj, P., P. Deepalakshmi, R. F. Mansour, and A. Almazroa. 2021. Artificial flora algorithm-based feature selection with gradient boosted tree Model for diabetes classification. Diabetes, Metabolic Syndrome and Obesity: Targets and Therapy 14:2789–806. doi:10.2147/DMSO.S312787.
  • Oladimeji, O. O., A. Oladimeji, and O. Oladimeji. May, 2021. Classification models for likelihood prediction of diabetes at early stage using feature selection. Applied Computing & Informatics ahead-of-print. doi:10.1108/ACI-01-2021-0022.
  • Ottenbacher, K. J. Jul, 1995. The chi-square test: Its use in rehabilitation research. Archives of Physical Medicine and Rehabilitation 76 (7):678–81. doi:10.1016/S0003-9993(95)80639-3.
  • Papatheodorou, K., M. Banach, M. Edmonds, N. Papanas, and D. Papazoglou. 2015. Complications of diabetes. Journal of Diabetes Research 2015:1–5. doi:10.1155/2015/189525.
  • Pearson’s Correlation Coefficient. 2008. Encyclopedia of Public Health. 1090–91. doi:10.1007/978-1-4020-5614-7_2569.
  • Pima Indians Diabetes Database | Kaggle. Accessed Jun 23, 2022. [Online]. Available: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database
  • Pirgazi, J., M. Alimoradi, T. Esmaeili Abharian, and M. H. Olyaee. Dec, 2019. An efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets. Scientific Reports 2019 9 (1):1–15. doi:10.1038/s41598-019-54987-1.
  • Ramesh, J., R. Aburukba, and A. Sagahyroon. Jun, 2021. A remote healthcare monitoring framework for diabetes prediction using machine learning. Healthcare Technology Letters 8 (3):45–57. doi:10.1049/HTL2.12010.
  • Ratner, B. Jun, 2009. The correlation coefficient: Its values range between 1/1, or do they. Journal of Targeting Measurement & Analysis for Marketing 17 (2):139–42. doi:10.1057/jt.2009.5.
  • Sabitha, E., and M. Durgadevi. 2022. Improving the diabetes Diagnosis prediction rate using data preprocessing, data augmentation and recursive feature elimination method. IJACSA) International Journal of Advanced Computer Science and Applications 13 (9). doi: 10.14569/IJACSA.2022.01309107.
  • Sahu, B., S. Dehuri, and A. Jagadev. Aug, 2018. A study on the relevance of feature selection methods in microarray data. The Open Bioinformatics Journal 11 (1):117–39. doi:10.2174/1875036201811010117.
  • Saxena, R., S. K. Sharma, M. Gupta, and G. C. Sampada. 2022. A novel approach for feature selection and classification of diabetes mellitus: Machine learning methods. Computational Intelligence and Neuroscience 2022:1–11. doi:10.1155/2022/3820360.
  • Sheik Abdullah, A., and S. Selvakumar. Oct, 2019. Assessment of the risk factors for type II diabetes using an improved combination of particle swarm optimization and decision trees by evaluation with Fisher’s linear discriminant analysis. Soft Computing 23 (20):9995–10017. doi:10.1007/s00500-018-3555-5.
  • Sneha, N., and T. Gangil. 2019. Analysis of diabetes mellitus for early prediction using optimal features selection. Journal of Big Data. doi:10.1186/s40537-019-0175-6.
  • Tadist, K., S. Najah, N. S. Nikolov, F. Mrabti, and A. Zahi. Dec, 2019. Feature selection methods and genomic big data: A systematic review. Journal of Big Data 6 (1):1–24. doi:10.1186/s40537-019-0241-0.
  • Tiwari, P., and V. Singh. Jan, 2021. Diabetes disease prediction using significant attribute selection and classification approach. Journal of Physics Conference Series 1714 (1):012013. doi:10.1088/1742-6596/1714/1/012013.
  • Tomic, D., J. E. Shaw, and D. J. Magliano. Sep, 2022. The burden and risks of emerging complications of diabetes mellitus. Nature Reviews Endocrinology 18 (9):525–39. doi:10.1038/S41574-022-00690-7.
  • Unnikrishnan, R., R. M. Anjana, and V. Mohan. 2016. Diabetes mellitus and its complications in India. Nature Reviews Endocrinology 12 (6):357–70. doi:10.1038/nrendo.2016.53.
  • Venkatesh, B., and J. Anuradha. 2019. A review of feature selection and its methods. Cybernetics and Information Technologies 19 (1):3–26. doi:10.2478/CAIT-2019-0001.
  • Yu, L., and H. Liu. Oct, 2004. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5:1205–24.
  • Zhang, T., T. Zhu, P. Xiong, H. Huo, Z. Tari, and W. Zhou. Mar, 2020. Correlated differential privacy: Feature selection in machine learning. IEEE Transactions on Industrial Informatics / a Publication of the IEEE Industrial Electronics Society 16 (3):2115–24. doi:10.1109/TII.2019.2936825.
  • Zhu, H., G. Liu, M. Zhou, Y. Xie, and Q. Kang. Jan, 2020. A noisy-sample-removed under-sampling scheme for imbalanced classification of public datasets. IFAC-Papersonline 53 (5):624–29. doi:10.1016/J.IFACOL.2021.04.202.