472
Views
0
CrossRef citations to date
0
Altmetric
Research Article

An interpretable predictive model for bank customers’ income using the eXtreme Gradient Boosting algorithm and the SHAP method: a case study of an Anonymous Chilean Bank

ORCID Icon, ORCID Icon &
Article: 2312290 | Received 22 Aug 2023, Accepted 26 Jan 2024, Published online: 06 Feb 2024

References

  • Alsahaf A, Petkov N, Shenoy V, Azzopardi G. 2022. A framework for feature selection through boosting. Expert Syst Appl. 187:115895.
  • Alsaleh N, Farooq B. 2021. Interpretable data-driven demand modelling for on-demand transit services. Transp Res Part A Policy Pract. 154:1–22.
  • Ambrey CL, Fleming CM. 2014. The causal effect of income on life satisfaction and the implications for valuing non-market goods. Econ Lett. 123(2):131–134.
  • Bergstra J, Yamins D, Cox D. 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In: International Conference on Machine Learning. PMLR. p.115–123.
  • Breiman L. 2001. Random forests. Mach Learn. 45:5–32.
  • Burlacu M. 2016. The population’income, expenses and savings as descriptive aspects of the standard of living. Ovidius Univ Ann Series Econ Sci. 16(2):175–180.
  • Bussolo M, Davalos ME, Peragine V, Sundaram R. 2018. Toward a new social contract: taking on distributional tensions in Europe and Central Asia. World Bank Publications.
  • Cai Q, Abdel-Aty M, Zheng O, Wu Y. 2022. Applying machine learning and google street view to explore effects of drivers’ visual environment on traffic safety. Transp Res Part C: Emerg Technol. 135:103541.
  • Chen T, Guestrin C. 2016. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. p. 785–794.
  • Chia G, Miller PW. 2008. Tertiary performance, field of study and graduate starting salaries. Aust Econ Rev. 41(1):15–31.
  • Fafoutellis P, Mantouka EG, Vlahogianni EI. 2022. Acceptance of a pay-how-you-drive pricing scheme for city traffic: the case of athens. Transp Res Part A Policy Pract. 156:270–284.
  • Fan J, Wang X, Wu L, Zhou H, Zhang F, Yu X, Lu X, Xiang Y. 2018. Comparison of support vector machine and extreme gradient boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: a case study in china. Energy Convers Manage. 164:102–111.
  • Fuders F. 2023. Economic resilience in the face of external shocks. In: How to Fulfil the UN sustainability goals: rethinking the role and concept of money in the light of sustainability. Cham: Springer. p. 327–344
  • Futagami K, Fukazawa Y, Kapoor N, Kito T. 2021. Pairwise acquisition prediction with shap value interpretation. J Finance Data Sci. 7:22–44.
  • García-Alonso CR, Torres-Jiménez M, Hervás-Martínez C. 2010. Income prediction in the agrarian sector using product unit neural networks. Eur J Oper Res. 204(2):355–365.
  • Goldberg LR, Mouti S. 2022. Sustainable investing and the cross-section of returns and maximum drawdown. J Finance Data Sci. 8:353–387.
  • Gunning D, Stefik M, Choi J, Miller T, Stumpf S, Yang G-Z. 2019. Xai-explainable artificial intelligence. Sci Robot. 4(37):eaay7120.
  • Hughes G. 1968. On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory. 14(1):55–63.
  • Jaquart P, Dann D, Weinhardt C. 2021. Short-term bitcoin market prediction via machine learning. Finance Data Sci. 7:45–66.
  • Keany E. 2020. Borutashap: A wrapper feature selection method which combines the Boruta feature selection algorithm with Shapley values (Version 1.1). Zenodo.
  • Kibekbaev A, Duman E. 2016. Benchmarking regression algorithms for income prediction modeling. Inf Syst. 61:40–52.
  • Kursa MB, Rudnicki WR. 2010. Feature selection with the Boruta package. J Stat Soft. 36(11):1–13.
  • Ładyżyński P, Żbikowski K, Gawrysiak P. 2019. Direct marketing campaigns in retail banking with the use of deep learning and random forests. Expert Syst Appl. 134:28–35.
  • Lang X, Wu D, Mao W. 2022. Comparison of supervised machine learning methods to predict ship propulsion power at sea. Ocean Eng. 245:110387.
  • Lazar A. 2004. Income prediction via support vector machine. In: ICMLA. p. 143–149.
  • Lin K, Gao Y. 2022. Model interpretability of financial fraud detection by group shap. Expert Syst Appl. 210:118354.
  • Lundberg SM, Lee, SI. 2017. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 30.
  • Lundberg SM, Erion GG, Lee SI. 2018. Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888.
  • Matkowski M. 2021. Prediction of individual income: A machine learning approach. Bryant Online Repository. Rhode Island: Bryant University.
  • Miller T. 2017. Explanation in artificial intelligence: insights from the social sciences. Artif Intell. 267:1–38.
  • Mokhtari KE, Higdon BP, Başar A. 2019. Interpreting financial time series with shap values. In: Proceedings of the 29th Annual International Conference on Computer Science and Software Engineering. p. 166–172.
  • Molnar C. 2020. Interpretable machine learning. Lulu. com.
  • Padarian J, McBratney AB, Minasny B. 2020. Game theory interpretation of digital soil mapping convolutional neural networks. Soil. 6(2):389–397.
  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. 2011. Scikit-learn: Machine learning in python. J Mach Learn Res. 12(Oct):2825–2830.
  • Pinelis M, Ruppert D. 2022. Machine learning portfolio allocation. J Finance Data Sci. 8:35–54.
  • Ross G, Das S, Sciro D, Raza H. 2021. Capitalvx: a machine learning model for startup selection and exit prediction. J Finance Data Sci. 7:94–114.
  • Salas P, De la Fuente R, Astroza S, Carrasco JA. 2022. A systematic comparative evaluation of machine learning classifiers and discrete choice models for travel mode choice in the presence of response heterogeneity. Expert Syst Appl. 193:116253.
  • Shapley LS. 1953. A value for n-person games. Contrib Theory Games. 2(28):307–317.
  • Shin S, Austin PC, Ross HJ, Abdel-Qadir H, Freitas C, Tomlinson G, Chicco D, Mahendiran M, Lawler PR, Billia F, Gramolini A, Epelman S, Wang B, Lee DS. 2021. Machine learning vs. conventional statistical models for predicting heart failure readmission and mortality. ESC Heart Failure 8(1):106–115.
  • Silva CAO, Gonzalez-Otero R, Bessani M, Mendoza LO, de Castro CL. 2022a. Interpretable risk models for sleep apnea and coronary diseases from structured and non-structured data. Expert Syst Appl. 200:116955.
  • Silva I, Ferreira C, Costa L, Sóter M, Carvalho L, Albuquerque DCJ, Sales M, Candido A, Reis F, Veloso A, et al. 2022b. Polycystic ovary syndrome: clinical and laboratory variables related to new phenotypes using machine-learning models. J Endocrinol Invest. 45:497–505.
  • Singh GD, Vig H, Kumar A. 2021. A data visualization approach for predicting the income class of the population. In: 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE. p. 1042–1047.
  • Smart JC. 1988. College influences on graduates’ income levels. Res High Educ. 29:41–59.
  • Swan N. 2006. Problems in dynamic modeling of individual incomes. In: Swedish Conference on Microsimulation, Stockholm. Vol. 20.
  • Thomas SL. 2000. Deferred costs and economic returns to college major, quality, and performance. Res High Educ. 41:281–313.
  • Trunk GV. 1979. A problem of dimensionality: a simple example. IEEE Trans Pattern Anal Mach Intell. PAMI-1(3):306–307.
  • Vythoulkas PC, Koutsopoulos HN. 2003. Modeling discrete choice behavior using concepts from fuzzy set theory, approximate reasoning and neural networks. Transp Res Part C Emerg Technol. 11(1):51–73.
  • Wang H, Liu C, Deng L. 2018. Enhanced prediction of hot spots at protein-protein interfaces using extreme gradient boosting. Sci Rep. 8(1):1–13.
  • Wu J, Chen X-Y, Zhang H, Xiong L.D, Lei H, Deng S-H. 2019. Hyperparameter optimization for machine learning models based on Bayesian optimization. J Electron Sci Technol. 17(1):26–40.
  • Yu D, Liu Z, Su C, Han Y, Duan, X, Zhang, R, Liu, X, Yang, Y, Xu, S. 2020a. Copy number variation in plasma as a tool for lung cancer prediction using extreme gradient boosting (xgboost) classifier. Thorac Cancer. 11(1):95–102.
  • Yu GB, Lee D-J, Sirgy MJ, Bosnjak M. 2020b. Household income, satisfaction with standard of living, and subjective well-being. the moderating role of happiness materialism. J. Happiness Stud. 21:2851–2872.