Abstract
With the rapid development of machine learning in the field of classification, the classification fairness has become the research emphasis second to prediction accuracy. However, the data bias and algorithmic discrimination that affect the fair classification of models have not been well resolved, which may damage or benefit the specific groups related to the sensitive attributes (e.g. age, race, and gender). To alleviate the unfairness of the classification model, this study proposes a novel bias-alleviated hybrid ensemble model (BAHEM) based on over-sampling and post-processing. First, a new clustering-based over-sampling method is proposed to reduce the data bias caused by the imbalance in label and sensitive attribute. Then, a stacking-based ensemble learning method is employed to obtain the higher performance and robustness of the BAHEM. Finally, a new classification with alternating normalisation (CAN)-based post-processing method is proposed to further improve the fairness and maintain the accuracy of the BAHEM. Three datasets with different sensitive attributes and four evaluation metrics were used to evaluate the prediction accuracy and fairness of the BAHEM. The experimental results verify the superior fairness of the BAHEM with little accuracy reduction.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Compliance with ethical standards
Conflicts of interest: The authors declare that there is no conflict of interests regarding the publication of this article.
Ethical standard: The authors state that this research complies with ethical standards. This research does not involve either human participants or animals.
Data availability statement
The datasets analysed during the current study are available in the UCI repository. German dataset is from https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german, Adult dataset is from https://archive.ics.uci.edu/ml/machine-learning-databases/adult, and Bank dataset is from https://archive.ics.uci.edu/ml/machine-learning-databases/00222.
Notes
1 Available at https://github.com/yhefang/BAHEM.