581
Views
0
CrossRef citations to date
0
Altmetric
Econometrics

Credit risk prediction with and without weights of evidence using quantitative learning models

ORCID Icon & ORCID Icon
Article: 2338971 | Received 14 Aug 2023, Accepted 01 Apr 2024, Published online: 15 Apr 2024
 

Abstract

The credit risk assessment process is necessary for maintaining financial stability, cost and time efficiency, model performance accuracy, comparability analysis and future business implications in the commercial banking sector. By accurately predicting credit risk, highly regulated banks can make informed lending decisions and minimize potential financial losses. The purpose of this paper is to assess the power of conventional predictive statistical models with and without transforming the features to gain better insights into customer’s creditworthiness. The findings of the predicted performance of the logistics regression model are compared to the performance results of machine learning models for credit risk assessment using commercial banking credit registry data. Each model has its strengths and weaknesses, and where one model lacks, another performs better. The article reveals that simpler credit risk assessment techniques delivered outstanding performance while consuming less processing power and have given insights into the most contributing feature categories. Improving a conventional predictive statistical model using some of the feature transformations reduces the overall model performance, specifically for credit registry data. The logistics regression model outperformed all models with the highest F1, accuracy, Jaccard Index and AUC values, respectively.

Impact statement

Financial institutions, specifically banks have questioned whether transformations using Weights of Evidence (WoE) have been significant in quantifying the relationship between categorical independent variables for various types of credit data. This study provides insights when considering the usage of feature transformation for credit risk modelling in commercial banking. The transformation technique is particularly useful in situations where statistical predictive modelling techniques are employed. The results revealed that not only can the logistic regression models perform similarly to the machine learning models but can also outperform them. The best performance is attributed to the simplicity, interpretability, and access to understanding features of individual clients within a portfolio of credit products. The logistic regression model without transformation turned out to perform the best out of the five machine learning models. Considering the business impact, enhancing the logistic regression model by using a WoE transformation did not improve the model's performance for commercial banking data considered. However, the transformation did provide insights regarding each binned categorical independent variable. Therefore, our findings in this article contribute towards assisting banks in managing the impact and interpretability of each binned feature category on the discriminatory power of credit scoring.

Data availability statement

The data used in the paper can be provided on request.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Ethics statement

No animal or human studies involved – all data are non-proprietary and freely available from the internet and other non-proprietary sources.

Additional information

Funding

The authors received no direct funding for this research.

Notes on contributors

Modisane B. Seitshiro

Modisane Seitshiro completed a PhD in Business Mathematics and Informatics (BMI) at North-West University in 2020. He is currently a Senior Lecturer at the Centre for BMI - NWU. His research interests are Applied Statistics and Quantitative Risk Management.

Seshni Govender

Seshni Govender achieved her Honours BSc in Engineering from the University of the Witwatersrand, Johannesburg, South Africa, in 2019. She also accomplished an Honours BCom degree in Financial Modelling from UNISA in 2022. Currently, she is pursuing a Master of Commerce (MCom) degree in Quantitative Management at UNISA. With a professional journey spanning over 4 years, she has gained valuable experience across diverse sectors including medical and financial administration, banking and consulting. Seshni holds the position of Quantitative Analyst at Nedbank. Her research focus revolves around enhancing the robustness and interpretability of machine learning within the areas of Data Science, Data Engineering and Data Analysis.