475
Views
0
CrossRef citations to date
0
Altmetric
Research Article

An interpretable predictive model for bank customers’ income using the eXtreme Gradient Boosting algorithm and the SHAP method: a case study of an Anonymous Chilean Bank

ORCID Icon, ORCID Icon &
Article: 2312290 | Received 22 Aug 2023, Accepted 26 Jan 2024, Published online: 06 Feb 2024
 

Abstract

In the dynamic landscape of banking institutions, acquiring accurate and timely information regarding customers’ incomes is crucial for effectively managing financial product offerings. To meet this demand, these institutions construct predictive models using numerous features, with only a subset contributing to capturing income variability. In this study, we propose a methodology for predicting monthly incomes by employing an XGBoost model with a reduced number of features. Feature reduction is accomplished through the implementation of Boruta and BorutaSHAP, ensuring that no predictive power is lost throughout the process. To enhance the transparency of the model’s predictions, we used the Shapley Additive Explanations (SHAP) method. The dataset used was provided by an anonymous bank from Chile, consisting of 10,000 records, 426 features, and a substantial proportion of missing values. The results demonstrate that the combination of feature selection methods and the XGBoost algorithm enables the development of a more concise model that maintains predictive performance. By leveraging the SHAP method, financial institutions can consistently identify and track influential features, thereby reducing complexity and training time without compromising predictive power. This research offers valuable contributions to financial institutions, as they can adopt our methodology to consistently identify and track the most influential features.

Acknowledgments

We would like to thank anonymous Chilean Bank institution who made this study possible by providing of dataset their customers.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Data science-based analysis techniques come from an interdisciplinary field that employs scientific methods, processes, and systems to extract knowledge and insights from data in various forms. These techniques involve the use of statistical, mathematical, and computational approaches to examine, interpret, and understand datasets.