Full article: Lightweight network for insulator fault detection based on improved YOLOv5

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Severe damage to insulators can hinder the daily operation of the power system. Current fault diagnosis methods heavily depend on manual visual inspection, leading to inefficiency and inaccuracies. While computer vision algorithms have been developed, their high requirements for running environments limit their applicability on edge devices. Additionally, the challenges in identifying insulator flashover faults have resulted in limited effectiveness in fault diagnosis. To address these issues, we introduce a novel one-stage network that enables real-time detection of insulator faults on mobile devices. We designed a new module that optimises the computational complexity of networks and fused the module with the attention mechanism SimAM to solve the problem of low efficiency in detecting flashover faults. Our research deploys multiple models on embedded devices in this article. Results indicate that the YOLOv5s-L-SimAM achieves the mAP of 93.9% and the model size is compressed to 9.4 MB, achieving the frame rate of 9.5 in the Jetson Xavier NX.

Keywords:

1. Introduction

As the demand for modern electricity continues to rise steadily, the importance and prominence of transmission lines have grown to ensure the lowest possible level of power supply interruption (Yi et al., Citation2023). Transmission lines are comprised of various components, each serving different functions, with the insulator being one of the most crucial (Han et al., Citation2023a). The insulators in the power system play a role in isolating, protecting and supporting, ensuring the normal operation of the power system, preventing current leakage and excessive voltage and protecting equipment from the influence of external environments (Dai, Citation2022; Deng et al., Citation2022). As a result of outdoor pollution and unpredictable weather conditions, insulators may occasionally be damaged. The damage or failure of insulators can be caused by various factors, including flashover voltage or a combination of multiple compromised components, including accessory failures, ultimately leading to power outages and huge economic losses (Liu et al., Citation2021). When a flashover event transpires in transmission lines, it has the potential to result in widespread power outages or even catastrophic accidents (Salem et al., Citation2022; Sun et al., Citation2022; Zhang et al., Citation2023). Therefore, ensuring the continuous operation of the power system relies heavily on the prompt and accurate detection of insulation defects.

The development of deep learning and unmanned aerial vehicles (UAVs) enables algorithms to automatically detect defective insulators in aerial images or videos, which has become a more common practice in the industry. The rapid and accurate identification of insulator images captured during drone inspections has become a significant area of research focus. Traditional detection algorithm for aerial images that relies heavily on local features and spatial sequences, which may not be suitable for analysing images with complex backgrounds. Additionally, the detection results of traditional algorithms for identifying insulators were affected by factors such as lighting conditions and image angles. As a result, deep learning algorithms have progressively supplanted traditional algorithms. Since the introduction of R-CNN by Girshick et al. (Citation2014), the utilisation of CNNs for object identification in images has surpassed traditional methods. To further achieve higher efficiency, Girshick (Citation2015) proposed Fast R-CNN and Faster R-CNN (Ren et al., Citation2017). In contrast, Redmon et al. (Citation2016) proposed the YOLO series pioneered the field of one-stage target detection. In comparison to two-stage algorithms, the YOLO family leverage a number of design principles to improved target detection speed. However, the accuracy of the algorithm is still unsatisfactory.

In recent years, remarkable advancements have been developed in the application of neural network for the detection of insulator objects in images. Wang et al. (Citation2018) fused the Faster R-CNN with semantic segmentation for detecting insulator defects. However, this approach did not yield satisfactory results for detecting flashover failures and missing fault defects. Therefore, subsequent studies shifted their focus from semantic segmentation to object detection for insulator detection purposes. Miao et al. (Citation2019) proposed an innovative approach that integrated a two-stage fine-tuning strategy with the Single Shot Multibox Detector (SSD), but the dataset they adopted lacks the detection of insulator fault types. Zheng et al. (Citation2021) utilised an enhanced fusion-based Single Shot Multibox Detector (SSD) for the purpose of insulator detection, but the dataset they adopt excessively overlooks background information which caused the loss of some insulator fault information so that some faults cannot be detected. YOLO series, as a classic target recognition algorithm, had been iterated multiple versions and derived many lightweight networks. Xing Chen (Citation2022) employed the MobileNet–YOLOv4 as a replacement for the YOLOv4 backbone network to design highly compact neural networks, but the computing complexity of the network layers made it impractical to deploy the model on embedded devices. Liu et al. (Citation2022) introduced a novel model for accurately locating the position of insulators. The average precision for detecting missing defects achieved an impressive 98.4%, surpassing the performance of Faster R-CNN by 5.2% and SSD by 10.2%, but the dataset they adopted did not consider other visible insulator defects. Feng et al. (Citation2023) employed the Gabor algorithm to predicted insulators within the identified foreground scene. Li et al. (Citation2023) used infrared images to detect the power equipment fault. However, the information of images was relatively limited, some of the faults cannot be identified through analysis of infrared images. To increase representation power of neural networks, many scholars had added attention mechanisms to the network. Han et al. (Citation2023b) incorporated attention mechanisms into the YOLOv4 and enabled the network to focus more on important information in images, but their research only concentrated on the lack of insulators, and indeed studies other types of faults. Antwi-Bekoe et al. (Citation2022) fused the attention mechanism into their proposed network. However, the semantic segmentation method they used resulted in the inability to identify other insulator faults. Han et al. (Citation2022) integrated the attention mechanism of ECA-Net into their network, the integration of this approach improved the capability of the network to identify overlapping insulator targets, leading to a notable enhancement in the overall performance of insulator detection. However, the network model proposed by them has the large model size, which made it impossible to deploy the model on the hardware.

In summary, current insulation fault detection algorithms typically require processing a large amount of sensor data and image information, which imposes high demands on computational and storage resources. However, most of them are often complex, making deployment on edge devices challenging. Moreover, they suffer from high computational latency and response time, failing to meet real-time requirements. By adopting lightweight models, the computational complexity of the algorithm can be reduced, improving its running speed and real-time performance, thus better meeting the real-time requirements of edge devices. Therefore, based on the above issues, the selection of the YOLOv5s network in this paper was based on its ability to strike a favourable balance between speed and accuracy, making it an optimal choice among the one-stage detection algorithms. In addition, to fulfil the deployment requirement and real-time demands of the model, whereas considering the computational efficiency of model parameters, we proposed a new network structure which was named the GhostConv and integrated the SimAM module that could push the size of parameters into an extremely low level whereas improving the feature extraction capability. Moreover, to validate the effectiveness of the results, the models were deployed on multiple different embedded devices and compared the inference time of the models. Our main contributions are in three aspects:

We proposed a novel lightweight convolution by integrating the Ghost module and depthwise separable convolution into the neck of the YOLOv5s network. This design effectively reduces the number of parameters and FLOPs.
We introduce a novel module named C3_SimAM, which effectively enhances the utilisation of model parameters and attains higher inference time of the model on hardware devices.
To ensure real-time detection and better compatibility with edge end devices, efforts were made to optimise the target detection algorithm, the YOLOv5s-L-SimAM was proposed and deployed on different edge computers to verify its efficiency.

2. Architecture of the YOLOv5 network

YOLO is a highly performant model on the COCO dataset (Figure ). This model offers the benefits of parameter efficiency and fast processing speed, making it suitable for embedded devices. In this article, we employ the YOLOv5s model from the fifth generation, which takes input images of size $640 \times 640$ pixels. The model incorporates three detection layers capable of identifying objects of varying sizes, including small, medium and large.

Figure 1. Training and deployment process of insulator detection model.

As Figure shows, the architecture of YOLOv5 is mainly composed of three parts: the neck network, the backbone network and the head network. The backbone network is CSPDarknet (Wang et al., Citation2020) which has C3 (concentrated-comprehensive convolution) module, CBS (convolution, batch normalisation layer, the activation function of the SILU) module and SPP (Spatial Pyramid Pooling) layer (He et al., Citation2015). The gradient disappearance and explosion issues were addressed by the batch normalisation layer. The SPP layers combined the different kernel size maximum pooling layers which are used to downsample operations.

Figure 2. The framework of YOLOv5.

The images are processed through the CSPDarknet to extract features that called feature layers. Then the neck network is fused with the three effective feature layers that obtained in the backbone part. By connecting two different features, it significantly improves the network's ability to extract targets of different scales. Finally, the detection head employs NMS (the weighted nonmaximum suppression) method and the generalised intersection over union (GIOU) loss function to generate precise object bounding boxes that include coordinates, categories and confidence scores. This process helps improve the accuracy of the detected objects by eliminating redundant bounding boxes and optimising the localisation of the objects.

3. Proposed method

Although the YOLOv5 algorithm has strong generalisation ability, the YOLOv5s algorithm has a large computation complexity and many redundant parameters, which are not conducive to deploy on the mobile devices with insufficient computational resources. We introduces a novel lightweight network called YOLOv5s-L-SimAM for identifying insulator defects.

3.1. The GhostConv

To enable edge deployment of the model, the issue of similar feature maps generated by deep neural networks, which hampers the extraction efficiency of feature maps. The GhostNet (Han et al., Citation2020) was proposed to deal with this problem.

As Figure shows, the workflow of this module can be categorised into three stages. In the first stage, it produces some features by traditional convolution. Then the Ghost feature maps are produced through a sequence of effective operations applied to each individual feature. Finally, the output is formed by concatenating the feature maps obtained in the first stage and the second stage.

Figure 3. The structure of the ghost module.

The mathematical definition of standard convolutional layers can be defined as follows: (1) $Y = X * f + b,$ (1) where * means traditional convolution operation, X $\in R^{h \times w \times c}$ are the input feature maps (h and w are the height and width of input feature maps, c denotes input channels), b is the symbol of bias term, Y $\in R^{h^{'} \times w^{'} \times n}$ represents the output feature maps, f $\in R^{c \times k \times k \times n}$ are the convolution layers and $k \times k$ is the kernel size in the convolution layers f.

The operation of the Ghost module can be formulated as follows: (2) $\begin{aligned} Y^{'} & = X * f^{'} \end{aligned}$ (2) (3) $\begin{aligned} Y^{′′} & = Φ_{i, j} * (y_{i}^{'}), \forall i = 1, \dots, m, j = 1, \dots, s, \end{aligned}$ (3) where $Y^{'} \in R^{h^{'} \times w^{'} \times m}$ obtained by convolutional operations $f^{'} \in R^{m \times k \times k \times c}$ on the input feature layer X $\in R^{h \times w \times c}$ . Due to traditional convolution generates redundant feature maps, some convolutional operations are replaced by cheap operations. The ghost module generates a reduced number of feature maps compared to full convolution. For maintaining the same output shape, the remaining feature maps are created through computationally inexpensive operations, as illustrated in Equation (Equation3(3) $\begin{aligned} Y^{′′} & = Φ_{i, j} * (y_{i}^{'}), \forall i = 1, \dots, m, j = 1, \dots, s, \end{aligned}$ (3) ). $y_{i}^{'}$ is the ith feature map in $Y^{'}$ , $Φ_{i, j}$ in the above function is the ith linear operation for generating the jth ghost feature map in the $Y^{′′}$ .

Finally, by Equation (Equation4(4) $Y = Y^{′′} + Y^{'},$ (4) ), the final output could be obtained through concatenating the output of cheap operations and convolutional operations. (4) $Y = Y^{′′} + Y^{'},$ (4) where + stands for the operation of concatenating. Y denotes the overall output of the ghost module.

To deal with the problem of generating redundant feature maps, this paper proposes a new lightweight convolution based on the idea of the Ghost module, called the GhostConv.

As Figure shows, the GhostConv consists of three parts: the Conv module, the DSC (Chollet, Citation2017) module and the concatenate layer. The term DSC stands for depthwise separable convolution, which contains two components – depthwise convolution (DW) and pointwise convolution (PW). The Conv denotes the sequential of convolution, batch normalisation and nonlinear activation function RELU. According to the GhostNet, the output of Conv contains many redundant feature maps. Therefore, the GhostConv adopts the depthwise separable convolution to generate different feature maps. With this module, the size of output feature maps can be guaranteed, and it can effectively prevent the issue of low network computation efficiency caused by redundant feature maps.

Figure 4. (A) Depthwise separable convolution; (B) Conv; (C) GhostConv.

Figure 5. Full 3-D weights for attention.

3.2. Mobile invert residual bottleneck convolution with attention mechanisms

When facing multiple fault locations, the lightweight network model fused with the GhostConv cannot focus well on multiple fault locations, resulting in low network recognition efficiency. This article introduces a novel module that integrates the attention mechanism SimAM into the mobile invert residual bottleneck convolution. This module aims to tackle the issue of accuracy degradation in lightweight networks and enhance the network's feature extraction capability.

SimAM (Yang et al., Citation2021) is a powerful attention module that can be easily embedded into neural network without increasing complexity. In contrast to existing channelwise and spatialwise attention mechanisms, SimAM introduced the 3D attention weights for generating feature maps, where these two mechanisms collaborated to process visual information, which enhanced network focus capability (Figure ).

The SimAM attention mechanism is proposed based on neuroscience theory, which optimises the energy function to uncover the importance of neurons, thereby enhancing the effective extraction of important features and effectively suppressing the interference of non-important features. Moreover, most of the operations of this attention mechanism are based on optimising the energy function selection, avoiding excessive structural adjustments, accelerating the calculation of attention weights and ensuring that the accuracy of the network can be maintained by incorporating the attention mechanism, thus better exerting the effectiveness and flexibility of the SimAM attention mechanism.

Sandler et al. (Citation2018) take the advantage of Inverted Residuals and Linear Bottlenecks in the MobileNetV2 to detect object. The inverted residuals in MobileNetV2 use the pointwise convolution to expand channels and intermediate expansion layers utilise the depthwise convolution, finally another pointwise convolution is used to reduce the dimensions. In MobilenetV3 (Howard et al., Citation2019), SE (Woo et al., Citation2018) was attached for channel attention extraction. Based on the classic module mentioned above, a new module called AM_Conv was proposed which introduced in Figure . In this module, the structure of the inverted residual linear bottleneck was maintained and integrated SimAM into it for replacing the SE module. Following the structure in YOLOv5, adopted the SILU as the activation function.

Figure 6. The AM_Conv block.

3.3. The YOLOv5s-L-SimAM network

In the neck network, the utilisation of C3_SimAM aimed to extract valuable semantic information and the CBS module was replaced with the GhostConv to construct the lightweight neural network. The introduction of the GhostConv module can improve the performance of the model to some extent, especially in resource-constrained environments. It helps reduce the model's storage requirements and computational costs while enhancing its perception and representation capabilities. The GhostConv module plays an important role in lightweight model design and mobile applications. At mean time, by introducing the C3_SimAM module, the network can be improved to achieve more accurate object detection and localisation. This leads to lower false positive rates and higher detection accuracy, thereby enhancing the overall detection performance. The above-mentioned methods can effectively leverage the information from all channels to focus more on important position. The YOLOv5s-L-SimAM network framework was shown in Figure .

Figure 7. The overall improved network framework of YOLOv5.

4. Experiments

This section presents a comprehensive overview of the experimental setup, including all the necessary details, the dataset used, and compares the results of various mainstream algorithms. Ablation experiments were also conducted to examine the effects of each proposed technique and analyse the results obtained. Furthermore, two models are selected and the inference tests are carried out on different embedded devices in this paper.

4.1. Experimental condition and parameter settings

We conducted all the experiments on the same device, and the hardware configuration and experimental environment details could be found in Table .

Table 1. Hardware configuration and operating environment.

Download CSV Display Table

4.2. Experimental setup

Based on hardware conditions, the model was trained for 600 epochs using the SGD optimiser. We used cosine learning rate schedule with the peak learning rate of 0.01 and set 0.0001 as the minimum learning rate. We also apply commonly used regularisation and data augmentation method, including weight decay, mixup and rand augment. Additionally, the pre-trained weights we adopted from the model trained on the COCO dataset were utilised for transfer learning experiments.

4.3. Experimental environment of edge devices

To verify the efficiency of the model on edge computing devices, three NVIDIA edge devices were selected as experimental platforms. These devices vary in computing power, power consumption and AI performance. Table shows the comparison of equipment. In this article, we select Ubuntu system as the operating systems for embedded devices. Furthermore, all devices provide good heat dissipation conditions, including passive heat sinks and heat dissipation fans, and their operating conditions are based on model deployment experiments conducted with sufficient current supply capacity and 12 V voltage as power input.

Table 2. Edge device hardware resources.

Download CSV Display Table

To facilitate the embedding of these devices, this paper also designs a dedicated on-board circuit for NVIDIA edge computing boards, which can be easily embedded into UAVs for practical applications, The specific physical images are shown in Figure .

Figure 8. The hardware device.

4.4. The insulator fault dataset

The dataset for the experiment consisted of insulator images obtained from aerial shots captured by the UAVs. As shown in Figure , by using rotating, stitching and flipping to increase the samples, a total of 1630 images were obtained, with the overall dataset size of 2.54 GB. This dataset contains two types of insulator faults, the two categories of insulator images are depicted in Figure .

Figure 9. Dataset expansion method: (A) Normal, (B) Flip, (C) Rotate and (D) Switch.

Figure 10. The example of the dataset: (A) Insulator Broken; (B) Insulator Flashover.

In this paper, the dataset is divided into the training set and the validation set before the processes of training and validation begin. To ensure efficient parameter adjustment and enhance model stability, the training set is utilised for model training. The performance of the model is then assessed on the validation set after updating its parameters, allowing for continuous improvement. Randomly dividing the dataset ensures an even distribution of image samples, preventing over-training on some similar images taken from repetitive positions. Therefore, we used random functions with built-in functions in Python to partition the data and used it for training and validation in all experiments. This approach facilitates model training, evaluation, and reduces the risk of overfitting. In this study, the dataset is randomly split into a 3:1 ratio, with 1222 images allocated to the training set and 408 images to the validation set. The training dataset includes 595 images of the insulator flashover fault and 628 images of the insulator broken fault. This validation dataset contains 192 images of the insulator flashover faults and 216 images of the insulator broken fault.

4.5. Validation metrics

This paper categorises the metrics into two dimensions: performance and complexity. We use the mAP, which is the mean of the AP(Average Precision), as the performance of the model. The formulas are shown in Equations (Equation5(5) $\begin{aligned} R (%) & = \frac{TP}{TP + FN} \times 100 \end{aligned}$ (5) ) and (Equation8(8) $\begin{aligned} mAP (0.5) (%) & = \frac{\sum_{i = 1}^{n} A P_{i}}{n} \times 100, \end{aligned}$ (8) ): (5) $\begin{aligned} R (%) & = \frac{TP}{TP + FN} \times 100 \end{aligned}$ (5) (6) $\begin{aligned} P (%) & = \frac{TP}{TP + FP} \times 100 \end{aligned}$ (6) among them, TP is the number of positive samples that the model correctly predicts as positive, FN is the number of positive samples that the model incorrectly predicts as negative, and FP is the number of negative samples that the model incorrectly predicts as positive: (7) $\begin{aligned} AP & = \int_{0}^{1} P (R) d R \end{aligned}$ (7) (8) $\begin{aligned} mAP (0.5) (%) & = \frac{\sum_{i = 1}^{n} A P_{i}}{n} \times 100, \end{aligned}$ (8) where n is the number of categories.

Regarding the complexity of the model, there are three evaluation indexes: parameters, FLOPs and model size, which are shown in Equations (Equation9(9) $\begin{aligned} Parameters & = [r \times (f \times f) \times o] + o \end{aligned}$ (9) ) and (Equation11(11) $\begin{aligned} FLOPs (Pool) & = \frac{W}{S} \times \frac{H}{S} \times C_{out}, \end{aligned}$ (11) ): (9) $\begin{aligned} Parameters & = [r \times (f \times f) \times o] + o \end{aligned}$ (9) (10) $\begin{aligned} FLOPs (Conv) & = 2 \times H \times W \times (C_{in} \times K^{2} + 1) \times C_{out} \end{aligned}$ (10) (11) $\begin{aligned} FLOPs (Pool) & = \frac{W}{S} \times \frac{H}{S} \times C_{out}, \end{aligned}$ (11) where r in Equation (Equation9(9) $\begin{aligned} Parameters & = [r \times (f \times f) \times o] + o \end{aligned}$ (9) ) means the input size of the model, f is the convolution kernel size in convolutional layers, o stands for the output size, H×W in Equation (Equation10(10) $\begin{aligned} FLOPs (Conv) & = 2 \times H \times W \times (C_{in} \times K^{2} + 1) \times C_{out} \end{aligned}$ (10) ) is the dimension of the output feature maps, $C_{in}$ is the number of input feature maps channels, K is the kernel size, $C_{out}$ stands for the output channel and S means the stride value in convolution operations.

4.6. Ablation experiment

To further evaluate the ability of the SimAM attention mechanism and the GhostConv in extracting features, this article compared the results of five structures. The YOLOv5s-L-SimAM denotes the lightweight model that used the SimAM as the attention mechanism. In Table , the YOLOv5s-L-SimAM model outperformed other networks in the mAP, which achieved the mAP of 93.9%. In comparison to YOLOv5s, the relative mAP improvement of YOLOv5s-L-SimAM is 0.9%, whereas the parameters and model decreased by 30% and 36% respectively.

Table 3. Comparison of the different attention mechanism schemes.

Display Table

As seen from Figure , the figure shows the tringing loss value of the YOLOv5s, the YOLOv5s-L and the YOLOv5s-L-SimAM which is the lightweight YOLOv5 network embedding the SimAM attention mechanism. The horizontal coordinate represents the number of iterations of the training dataset and the loss value of training represented by the vertical axis. Under the calculation method of maintaining a fixed loss function, it can be seen from the figure that the training loss of the YOLOv5s-L-SimAM is always lower than YOLOv5s and the lightweight YOLOv5s algorithm, indicating that the lightweight network combined with SimAM attention mechanism can more effectively detect insulator faults.

4.7. Analysis of detection results

In different scenarios, Figure uncovers the results of two different algorithms. As depicted in Figure (A), the YOLOv5s algorithm fails to detect multiple faults. Moreover, when the images are overexposed, the algorithm's ability to extract fault features decreases due to a decrease in image quality. The method proposed in this article can effectively solve this type of problem. Figure (B) illustrates the detection performance of the YOLOv5s-L-SimAM algorithm, which successfully detects insulator defects. Notably, facing multiple insulator failures or poor image quality, the algorithm proposed in this paper demonstrates excellent recognition capabilities to identify fault location points.

Figure 11. Training loss.

Figure 12. Comparison of the detection results: (A) YOLOv5s; (B) YOLOv5s-L-SimAM.

4.8. Comparison of attention mechanisms

Additionally, different attention mechanism fusion experiments were conducted and compared in this article. The YOLOv5s-L is the lightweight model which remained the structure of the backbone network and inserted the lightweight convolution into the neck network. Based on this foundation, the attention mechanisms of CBAM, SE, and ECA were introduced to replace the SimAM attention mechanism in the YOLOv5s.

Figure shows that the YOLOv5s-SimAM algorithm can quickly improve accuracy in the first few epochs, and the final mAP is higher than other attention mechanism. From Table , we could infer that the mobile inverted residual bottleneck convolution fused with the attention mechanism SimAM outperforms other attention mechanisms. The incorporation of the attention mechanism into the network proves to be an effective approach in mitigating the decline in mAP (mean Average Precision) caused by lightweight models (Figure ).

Figure 13. Comparison of mAP trends during algorithm training.

Figure 14. Comparison of mean average accuracy of different algorithms.

Table 4. Comparison of mAP for different attention mechanism algorithms.

Display Table

4.9. Comparison of different algorithms

In this article, we compared some classic models and SOTA model. Table indicates that both Faster R-CNN, SSD and YOLOv4 algorithms unable to meet practical application requirements due to low detection accuracy. Although the mAP of SSD and YOLOv3 is considerable, the parameter and computational complexity of their models are not suitable for scenarios where edge device computing is limited. Compared with the YOLOv7, our network only lost 1.1% on mAP, but the model size we proposed is only 1/7 of the YOLOv7. Additionally, YOLOv7 adopts various novel modules that are not conducive to fast computation on edge devices. For the insulator dataset used in this article, the mAP of the YOLOv8 is 93.2%, which is lower than the mAP achieved by the network we proposed. Moreover, under the same mAP, the computational resource required by this algorithm is much higher than the model proposed in this paper. Therefore, our model has advantages in terms of computational complexity, model parameter, and computing power required by the algorithm.

Table 5. Comparison of different models.

Download CSV Display Table

4.10. Hardware device deployment experiments

Furthermore, different models were selected in this article for comparison in edge deployment experiments, including the YOLOv5s-L-SimAM, the YOLOv5n, and the YOLOv5s models. These algorithms are deployed on the Jetson Nano, Jetson TX2 NX and Jetson Xavier NX edge devices. From Table , we could infer that our model runs 0.6-1.1 FPS faster than the YOLOv5s during actual inference in different embedded devices.

Table 6. Results of models inference on different edge devices.

Download CSV Display Table

5. Conclusion

Deploying lightweight neural networks to edge devices is a highly challenging task, and it is even more difficult to ensure that the model can keep certain accuracy and real-time performance. Due to environmental factors and computing power can all affect the final performance of the algorithm. In this article, we attempt to propose a lightweight network that combines attention mechanisms to address the issues. The GhostConv module is one of the lightweight methods proposed in this article, which can effectively prevent the generation of redundant feature maps, thereby improving the diversity of model feature maps, avoiding feature information loss, and effectively transmitting information to the next module. The GhostConv module will inevitably lead to low efficiency in feature extraction, so this article proposes a new type of module to address this issue. The C3_SimAM module combines the parameter free attention mechanism module SimAM, which can effectively improve the model's attention to identifying objects without requiring additional parameters for learning. Results show that the YOLOv5s-L-SimAM could obtain a satisfactory real-time detection frame rate on embedded devices with less parameters and computation. In comparison to the YOLOv5s network, the mAP reaches 93.9% whereas the model size is compressed to 9.4MB, the parameters of the network decrease to 72% and the FLOPs decreased to 76% of the YOLOv5s network. Furthermore, the algorithm achieves a real-time frame rate of 9.5 FPS on the embedded computing device Jetson Xavier NX.

However, the experiments are all carried out under the condition of abundant energy supply. Since the energy supply of embedded devices is not always in an ideal state in actual operation, the power consumption of embedded devices' system operation cannot be considered. Moreover, the algorithm deployment designed in this paper is tested in Pytorch environment, which is not friendly to embedded devices. In the future, we will focus more on the power consumption of the device, and pay attention to the optimisation of the algorithm in the case of ensuring insufficient energy supply, so as to ensure the actual detection accuracy. At the same time, we will also transplant algorithms for different embedded platforms and build inference architectures for edge computing to make more effective use of the computing power resources of edge computing devices.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the scientific research project of Wenzhou under Grant ZF2022003.

References

Antwi-Bekoe, E., Liu, G., Ainam, J.-P., Sun, G., & Xie, X. (2022). A deep learning approach for insulator instance segmentation and defect detection. Neural Computing and Applications, 34(9), 7253–7269. https://doi.org/10.1007/s00521-021-06792-z.
Web of Science ®Google Scholar
Chollet, F.. (2017). Xception: Deep learning with depthwise separable convolutions.
Google Scholar
Dai, Z. (2022). Uncertainty-aware accurate insulator fault detection based on an improved YOLOX model. Energy Reports, 8, 12809–12821. https://doi.org/10.1016/j.egyr.2022.09.195.
Web of Science ®Google Scholar
Deng, F., Xie, Z., Mao, W., Li, B., Shan, Y., Wei, B., & Zeng, H. (2022). Research on edge intelligent recognition method oriented to transmission line insulator fault detection. International Journal of Electrical Power & Energy Systems, 139, 108054. https://doi.org/10.1016/j.ijepes.2022.108054.
Web of Science ®Google Scholar
Feng, L., Zhang, L., Gao, Z., Zhou, R., & Li, L. (2023). Gabor-YOLONet: A lightweight and efficient detection network for low-voltage power lines from unmanned aerial vehicle images. Frontiers in Energy Research, 10, 960842. https://doi.org/10.3389/fenrg.2022.960842.
Web of Science ®Google Scholar
Girshick, R., Donahue, J., Darrell, T., & Malik, J.. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation.
Google Scholar
Girshick, R. B.. (2015). Fast R-CNN. CoRR, abs/1504.08083.
Google Scholar
Han, G., He, M., Gao, M., Yu, J., Liu, K., & Qin, L. (2022). Insulator breakage detection based on improved YOLOv5. Sustainability, 14(10), https://doi.org/10.3390/su14106066.
Web of Science ®Google Scholar
Han, G., Zhao, L., Li, Q., Li, S., Wang, R., Yuan, Q., He, M., Yang, S., & Qin, L. (2023). A lightweight algorithm for insulator target detection and defect identification. Sensors, 23(3), https://doi.org/10.3390/s23031216.
Web of Science ®Google Scholar
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., & Xu, C.. (2020). GhostNet: More features from cheap operations.
Google Scholar
Han, S., Lin, Y., Guo, Z., & Lv, K. (2023). A lightweight and style-robust neural network for autonomous driving in end side devices. Connection Science, 35(1), 2155613. https://doi.org/10.1080/09540091.2022.2155613.
Web of Science ®Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824.
PubMed Web of Science ®Google Scholar
Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., Le, Q. V., & Adam, H.. (2019). Searching for MobileNetV3. CoRR, abs/1905.02244.
Google Scholar
Hu, J., Shen, L., Albanie, S., Sun, G., & Wu, E. (2020). Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8), 2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372.
PubMed Web of Science ®Google Scholar
Li, J., Xu, Y., Nie, K., Cao, B., Zuo, S., & Zhu, J. (2023). PEDNet: A lightweight detection network of power equipment in infrared image based on YOLOv4-Tiny. IEEE Transactions on Instrumentation and Measurement, 72, 1–12. https://doi.org/10.1109/TIM.2023.3235416.
PubMed Web of Science ®Google Scholar
Liu, C., Wu, Y., Liu, J., Sun, Z., & Xu, H. (2021). Insulator faults detection in aerial images from high-voltage transmission lines based on deep learning model. Applied Sciences, 11(10), https://doi.org/10.3390/app11104647.
Google Scholar
Liu, J., Liu, C., Wu, Y., Sun, Z., & Xu, H. (2022). Insulators' identification and missing defect detection in aerial images based on cascaded YOLO models. Computational Intelligence and Neuroscience, 2022, 7113765. https://doi.org/10.1155/2022/7113765.
PubMed Web of Science ®Google Scholar
Miao, X., Liu, X., Chen, J., Zhuang, S., Fan, J., & Jiang, H. (2019). Insulator detection in aerial images for transmission line inspection using single shot multibox detector. IEEE Access, 7, 9945–9956. https://doi.org/10.1109/ACCESS.2019.2891123.
Web of Science ®Google Scholar
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A.. (2016). You only look once: Unified, real-time object detection.
Google Scholar
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031.
PubMed Web of Science ®Google Scholar
Salem, A. A., Lau, K. Y., Rahiman, W., Abdul-Malek, Z., Al-Gailani, S. A., Mohammed, N., Rahman, R. A., & Al-Ameri, S. M. (2022). Pollution flashover voltage of transmission line insulators: systematic review of experimental works. IEEE Access, 10, 10416–10444. https://doi.org/10.1109/ACCESS.2022.3143534.
Web of Science ®Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C.. (2018). MobileNetV2: Inverted residuals and linear bottlenecks.
Google Scholar
Sun, J., Song, S., Zheng, J., Li, Z., Huo, J., Wang, Y., Xiao, P., Akram, S., & Qin, D. (2022). A review on surface flashover phenomena at DC voltage in vacuum and compressed gas. IEEE Transactions on Dielectrics and Electrical Insulation, 29(1), 1–14. https://doi.org/10.1109/TDEI.2022.3148456.
Web of Science ®Google Scholar
Wang, C.-Y., Mark Liao, H.-Y., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., & Yeh, I. H.. (2020). CSPNet: A new backbone that can enhance learning capability of CNN.
Google Scholar
Wang, Y., Wang, J., Gao, F., Hu, P., Xu, L., Zhang, J., Yu, Y., Xue, J., & Li, J.. (2018). Detection and recognition for fault insulator based on deep learning.
Google Scholar
Woo, S., Park, J., Lee, J.-Y., & Kweon, I. S.. (2018). CBAM: Convolutional block attention module.
Google Scholar
Xing, Z., & Chen, X. (2022). Lightweight algorithm of insulator identification applicable to electric power engineering. Energy Reports, 8, 353–362. https://doi.org/10.1016/j.egyr.2022.01.209.
Web of Science ®Google Scholar
Yang, L., Zhang, R.-Y., Li, L., & Xie, X.. (2021). Simam: A simple, parameter-free attention module for convolutional neural networks.
Google Scholar
Yi, W., Ma, S., & Li, R. (2023). Insulator and defect detection model based on improved YOLO-S. IEEE Access, 1–1. https://doi.org/10.1109/ACCESS.2023.3309693.
Google Scholar
Zhang, K., Liu, J., Zhong, J., & Jing, Y. (2023). Diagnosis of contamination discharge state of porcelain insulators based on GA-CNN. Connection Science, 35(1), 2085666. https://doi.org/10.1080/09540091.2022.2085666.
Web of Science ®Google Scholar
Zheng, H., Sun, Y., Liu, X., Djike, C. L. T., Li, J., Liu, Y., Ma, J., Xu, K., & Zhang, C. (2021). Infrared image detection of substation insulators using an improved fusion single shot multibox detector. IEEE Transactions on Power Delivery, 36(6), 3351–3359. https://doi.org/10.1109/TPWRD.2020.3038880.
Web of Science ®Google Scholar

Lightweight network for insulator fault detection based on improved YOLOv5

Abstract

1. Introduction

2. Architecture of the YOLOv5 network