581
Views
0
CrossRef citations to date
0
Altmetric
Econometrics

Credit risk prediction with and without weights of evidence using quantitative learning models

ORCID Icon & ORCID Icon
Article: 2338971 | Received 14 Aug 2023, Accepted 01 Apr 2024, Published online: 15 Apr 2024

References

  • Abu Alfeilat, H. A., Hassanat, A. B., Lasassmeh, O., Tarawneh, A. S., Alhasanat, M. B., Eyal Salman, H. S., & Prasath, V S. (2019). Effects of distance measure choice on k-nearest neighbor classifier performance: A review. Big Data, 7(4), 221–248. https://doi.org/10.1089/big.2018.0175
  • Aggarwal, A., Kasiviswanathan, S., Xu, Z., Feyisetan, O., & Teissier, N. (2021 Label inference attacks from log-loss scores [Paper presentation]. International Conference on Machine Learning (pp. 120–129). PMLR.
  • Awad, M., & Khanna, R. (2015). Support vector machines for classification. In Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers (pp. 39–66). Apress: Berkeley, CA, USA. https://doi.org/10.1007/978-1-4302-5990-9_3
  • Bénédict, G., Koops, V., Odijk, D., & de Rijke, M. (2021). Sigmoidf1: A smooth f1 score surrogate loss for multilabel classification. arXiv preprint arXiv:2108.10566.
  • Beniwal, S., & Arora, J. (2012). Classification and feature selection techniques in data mining. International Journal of Engineering Research & Technology (Ijert), 1(6), 1–6.
  • Bernardo, J., & Smith, A. (1994). Bayesian theory. Wiley.
  • Castelo, R., & Giudici, P. (2001). Association models for web mining. Data Mining and Knowledge Discovery, 5(3), 183–196. https://doi.org/10.1023/A:1011469000311
  • Chen, K., Zhu, K., Meng, Y., Yadav, A., & Khan, A. (2020). Mixed credit scoring model of logistic regression and evidence weight in the background of big data. In Intelligent Systems Design and Applications: 18th International Conference on Intelligent Systems Design and Applications (ISDA 2018) held in Vellore, India, December 6–8, 2018 (vol.1, pp. 435–443). Springer.
  • Chen, X., Chong, Z., Giudici, P., & Huang, B. (2022). Network centrality effects in peer to peer lending. Physica A: Statistical Mechanics and Its Applications, 600, 127546. https://doi.org/10.1016/j.physa.2022.127546
  • Czepiel, S. A. (2002). Maximum likelihood estimation of logistic regression models: Theory and implementation. Available at czep. net/stat/mlelr.pdf, 83.
  • De Long, E. R., De Long, D. M., & Clarke-Pearson, D L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics, 44(3), 837–845.
  • Dumitrescu, E. I., Hué, S., Hurlin, C., & Tokpavi, S. (2022). Machine learning for credit scoring: Improving logistic regression with non-linear decision tree effects. European Journal of Operational Research, 297(3), 1178–1192. https://doi.org/10.1016/j.ejor.2021.06.053
  • Eelbode, T., Bertels, J., Berman, M., Vandermeulen, D., Maes, F., Bisschops, R., & Blaschko, M B. (2020). Optimization for medical image segmentation: Theory and practice when evaluating with dice score or jaccard index. IEEE Transactions on Medical Imaging, 39(11), 3679–3690. https://doi.org/10.1109/TMI.2020.3002417
  • Figini, S., & Giudici, P. (2011). Statistical merging of rating models. Journal of the Operational Research Society, 62(6), 1067–1074. https://doi.org/10.1057/jors.2010.41
  • Giudici, P., Hadji-Misheva, B., & Spelta, A. (2020). Network based credit risk models. Quality Engineering, 32(2), 199–211. https://doi.org/10.1080/08982112.2019.1655159
  • Giudici, P., & Raffinetti, E. (2023). Safe artificial intelligence in finance. Finance Research Letters, 56, 104088. https://doi.org/10.1016/j.frl.2023.104088
  • Giudici, P. S. (2001). Bayesian data mining, with application to benchmarking and credit scoring. Applied Stochastic Models in Business and Industry, 17(1), 69–81. https://doi.org/10.1002/asmb.425
  • Groemping, U. (2019). South German credit data: Correcting a widely used data set. Beuth University of Applied Sciences, Berlin, Germany, Technical report 4/2019. http://www1.beuth-hochschule.de/FB_II/reports/Report-2019-004.pdf
  • Hand, D. J., & Henley, W E. (1997). Statistical classification methods in consumer credit scoring: A review. Journal of the Royal Statistical Society Series A: Statistics in Society, 160(3), 523–541. https://doi.org/10.1111/j.1467-985X.1997.00078.x
  • Henley, W., & Hand, D J. (1996). A k-nearest-neighbour classifier for assessing consumer credit risk. The Statistician, 45(1), 77–95. https://doi.org/10.2307/2348414
  • Hull, J., & Suo, W. (2002). A methodology for assessing model risk and its application to the implied volatility function model. The Journal of Financial and Quantitative Analysis, 37(2), 297–318. https://doi.org/10.2307/3595007
  • James, G., Witten, D., Hastie, T., Tibshirani, R., & Taylor, J. (2023). Statistical learning. In An introduction to statistical learning: With applications in Python (pp. 15–67). Springer.
  • Joseph, V R. (2022). Optimal ratio for data splitting. Statistical Analysis and Data Mining: The ASA Data Science Journal, 15(4), 531–538. https://doi.org/10.1002/sam.11583
  • Kaggle. (2020). South german credit prediction. Society for Data Science, BIT Mesra, Kaggle. https://www.kaggle.com/c/south-german-credit-prediction/overview.
  • Korkmaz, M., Güney, S., & Yiğiter, Ş. (2012). The importance of logistic regression implementations in the Turkish livestock sector and logistic regression implementations/fields. Harran Tarım ve Gıda Bilimleri Dergisi, 16(2), 25–36.
  • Kumar, A., Sharma, S., & Mahdavi, M. (2021). Machine learning (ml) technologies for digital credit scoring in rural finance: A literature review. Risks, 9(11), 192. https://doi.org/10.3390/risks9110192
  • Liu, H., Li, J., & Wong, L. (2002). A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics, 13, 51–60.
  • Mačerinskienė, I., Ivaškevičiūtė, L., & Railienė, G. (2014). The financial crisis impact on credit risk management in commercial banks. KSI Transactions on Knowledge Society, 7(1), 5–16.
  • Muchai, E., & Odongo, L. (2014). Comparison of crisp and fuzzy classification trees using gini index impurity measure on simulated data. European Scientific Journal, 10(18), 130–134.
  • Nehrebecka, N. (2018). Predicting the default risk of companies. Comparison of credit scoring models: Logit vs support vector machines. Econometrics, 22(2), 54–73. https://doi.org/10.15611/eada.2018.2.05
  • Paryudi, I. (2019). What affects k value selection in k-nearest neighbor. International Journal of Scientific & Technology Research, 8, 86–92.
  • Persson, R. (2021). Weight of evidence transformation in credit scoring models: How does it affect the discriminatory power? [PhD thesis]. Lund University.
  • Podgorelec, V., Kokol, P., Stiglic, B., & Rozman, I. (2002). Decision trees: An overview and their use in medicine. Journal of Medical Systems, 26(5), 445–463. https://doi.org/10.1023/a:1016409317640
  • Satchidananda, S., & Simha, J. B. (2006). Comparing decision trees with logistic regression for credit risk analysis. International Institute of Information Technology.
  • Seitshiro, M., & Mashele, H. (2022). Quantification of model risk that is caused by model misspecification. Journal of Applied Statistics, 49(5), 1065–1085. https://doi.org/10.1080/02664763.2020.1849055
  • Stanfill, C., & Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM, 29(12), 1213–1228. https://doi.org/10.1145/7902.7906
  • Suhadolnik, N., Ueyama, J., & Da Silva, S. (2023). Machine learning for enhanced credit risk assessment: An empirical approach. Journal of Risk and Financial Management, 16(12), 496. https://doi.org/10.3390/jrfm16120496
  • Tu, J V. (1996). Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of Clinical Epidemiology, 49(11), 1225–1231. https://doi.org/10.1016/s0895-4356(96)00002-9
  • Wang, Y., Zhang, Y., Lu, Y., & Yu, X. (2020). A comparative assessment of credit risk model based on machine learning——A case study of bank loan data. Procedia Computer Science, 174, 141–149. https://doi.org/10.1016/j.procs.2020.06.069
  • Weed, D L. (2005). Weight of evidence: A review of concept and methods. Risk Analysis: An Official Publication of the Society for Risk Analysis, 25(6), 1545–1557. https://doi.org/10.1111/j.1539-6924.2005.00699.x
  • Xia, Y., Liu, C., Li, Y., & Liu, N. (2017). A boosted decision tree approach using bayesian hyper-parameter optimization for credit scoring. Expert Systems with Applications, 78, 225–241. https://doi.org/10.1016/j.eswa.2017.02.017
  • Yang, X., Zhu, Y., Yan, L., & Wang, X. (2015 Credit risk model based on logistic regression and weight of evidence [Paper presentation]. 3rd International Conference on Management Science, Education Technology, Arts, Social Science and Economics (pp. 810–814). Atlantis Press. https://doi.org/10.2991/msetasse-15.2015.180
  • Yap, B. W., Ong, S. H., & Husain, N H M. (2011). Using data mining to improve assessment of credit worthiness via credit scoring models. Expert Systems with Applications, 38(10), 13274–13283. https://doi.org/10.1016/j.eswa.2011.04.147
  • Zhang, A. (2009). Statistical methods in credit risk modeling [PhD thesis]. University of Michigan.