Abstract
The introduction of smart meters, sensors and integrated electronic devices in the electrical secondary distribution network (ESDN) has led to the collection of massive amounts of data. Accurate prediction of faults from this data can help to improve the reliability, safety and operational efficiency of ESDN. Due to its complexity, ESDN big data are hard to process and manage using traditional technologies and tools. The difficulties posed by dataset complexity arise from issues including high dimensionality, imbalance and variability, and one current challenge is to address them simultaneously. Currently, the capability of fault prediction techniques to address this challenge remains limited. New approaches are needed to overcome it. To this purpose, this article presents a big data-based ensemble for fault prediction in ESDN (BDEFP-ESDN) on Apache Spark with gradient-boosted trees, random forest, decision tree and binomial logistic regression base models. BDEFP-ESDN is optimized for the complexity of the ESDN dataset by dimension reduction, bootstrap sampling and hyperparameter optimization approaches in the training process and a weighted voting approach in the prediction process. Our experimental results illustrate the efficiency of BDEFP-ESDN against traditional classifiers like ANN, SVM, RF and XGB, achieving an accuracy of 99.6% for both binary and multiclass classification.
Reviewing Editor:
Acknowledgment
This research was carried out as part of the iGrid-Project at the University of Dar es Salaam (UDSM) under the Swedish International Development Agency (SIDA) sponsorship. The authors also appreciate TANESCO for the collaboration provided during the research.
Disclosure statement
No conflict of interest.
Additional information
Notes on contributors
David T. Makota
David Makota is an assistant lecturer and consultant in the Department of Computer Science, at the Institute of Finance Management (IFM), in Dar es Salaam, Tanzania. He is pursuing his doctoral degree in Computer and IT Systems Engineering at the University of Dar es Salaam. His research interests include machine learning, big data, knowledge sharing, smart grid and database systems.