704
Views
1
CrossRef citations to date
0
Altmetric
Research Article

A novel bias-alleviated hybrid ensemble model based on over-sampling and post-processing for fair classification

, , &
Article: 2184310 | Received 12 Dec 2022, Accepted 21 Feb 2023, Published online: 17 Mar 2023
 

Abstract

With the rapid development of machine learning in the field of classification, the classification fairness has become the research emphasis second to prediction accuracy. However, the data bias and algorithmic discrimination that affect the fair classification of models have not been well resolved, which may damage or benefit the specific groups related to the sensitive attributes (e.g. age, race, and gender). To alleviate the unfairness of the classification model, this study proposes a novel bias-alleviated hybrid ensemble model (BAHEM) based on over-sampling and post-processing. First, a new clustering-based over-sampling method is proposed to reduce the data bias caused by the imbalance in label and sensitive attribute. Then, a stacking-based ensemble learning method is employed to obtain the higher performance and robustness of the BAHEM. Finally, a new classification with alternating normalisation (CAN)-based post-processing method is proposed to further improve the fairness and maintain the accuracy of the BAHEM. Three datasets with different sensitive attributes and four evaluation metrics were used to evaluate the prediction accuracy and fairness of the BAHEM. The experimental results verify the superior fairness of the BAHEM with little accuracy reduction.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Compliance with ethical standards

Conflicts of interest: The authors declare that there is no conflict of interests regarding the publication of this article.

Ethical standard: The authors state that this research complies with ethical standards. This research does not involve either human participants or animals.

Data availability statement

The datasets analysed during the current study are available in the UCI repository. German dataset is from https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german, Adult dataset is from https://archive.ics.uci.edu/ml/machine-learning-databases/adult, and Bank dataset is from https://archive.ics.uci.edu/ml/machine-learning-databases/00222.

Notes

Additional information

Funding

This work has been supported by Fundamental Research Funds for the Provincial Universities of Zhejiang Institute of Economics and Trade (No. 19YQ19), National Natural Science Foundation of China (No. 51875503), Zhejiang Natural Science Foundation of China (No. LZ20E050001), and Zhejiang Key R & D Project of China (No. 2022C03166).