Full article: DeFRCN-MAM: DeFRCN and multi-scale attention mechanism-based industrial defect detection method

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

With the technology development, industrial defect detection based on deep learning has attracted extensive attention in the academic community. Different from general visual objects, industrial defects have the characteristics of small sample, weak visibility and irregular shape, which hinder the application of related studies. According to these problems, a few-shot object detection (FSOD) method based on Decoupled Faster R-CNN (DeFRCN) is proposed in this paper. Firstly, it includes fine-tuning processing, because of the small sample characteristics. To adapt to the invisible characteristics of defects, we introduce the Feature Pyramid Network (FPN) and Residual Attention Module (RAM) into DeFRCN, which can enhance the capture ability of multi-scale features and feature association information. Furthermore, the feature representation ability is strengthened by parallel connecting of two channels, consisting of R-CNN head, box classifier and box regression models. Finally, it is completed that the pre-training, fine-tuning and testing of the proposed network, with DAGM 2007 and NEU-DET public industrial defect datasets as the base class and flange shaft defect data collected in the laboratory as the new class. To verify the effectiveness of the proposed one, we compare them with other classical FSOD methods. The superiority of the proposed method is obvious.

Introduction

Industrial defect detection aims to capture the appearance defects of various industrial products, which is one of the important technologies to ensure product quality and maintain production stability. The related studies can be applied to scenarios such as unmanned inspection, intelligent inspection, production control, anomaly traceability, etc. In addition, with the development of industrial imaging, computer vision, deep learning and other fields, industrial defect detection technology based on deep learning has become an effective method for product appearance inspection. It has attracted strong attention from academia and industry. However, different from the general objects, industrial defect has many characteristics, i.e. the shortage of defective samples, poor visibility of defects, irregular defects, etc. As a result, industrial defect detection is difficult to meet the requirements of high-precision and high-speed detection tasks at the same time, which hinders their practical application (Ren et al. Citation2021; Saberironaghi, Ren, and El-Gindy Citation2023; Tang et al. Citation2022). The main problems and the related studies of this field are simply shown in .

Table 1. The main problems and related studies for industrial defect detection task.

Download CSV Display Table

First of all, the shortage of defective samples is mainly caused by the strict control of defective rate in the industrial production process, the limited number of defective samples, and the high cost of accurate labeling of defective samples. Hence, it is inevitable that the imbalance of positive and negative samples in industrial samples. This problem is the core difficulty in industrial defect detection. To solve this problem, relevant researchers focus on small-sample learning (Imoto et al. Citation2019; Tang et al. Citation2020), semi-supervised learning (Gao et al. Citation2020; He et al. Citation2019), weakly supervised learning(Niu et al. Citation2019; Zhu et al. Citation2017), self-supervised learning (Fei et al. Citation2020; Ristea et al. Citation2021), etc. Among them, small sample learning can be used to directly improve the performance of the model when samples are scarce. The main ideas include:

Modify network structure and loss function to avoid overfitting phenomenon caused by rare samples (Zhou et al. Citation2020; Xie et al. Citation2023)
Accelerate network convergence by designing network structure and stable optimization process (Tao et al. Citation2018)
Targeted design of corresponding defect data to achieve defect sample more comprehensive expression(Lin et al. Citation2020; Apostolopoulos et al. Citation2023)
Transfer the knowledge of other task models to the industrial defect detection task, and fine-tune the pre-trained model with a small amount of data (Jing, Ma, and Zhang Citation2019; Zhang et al. Citation2023)

Among them, it needs more theoretical experience as support that helps in network structure modification, loss function and optimization process design. The sample enhancement method can achieve performance superposition by combining different enhancement methods, but the improvement is limited. Moreover, if it is too large that the difference between the enhanced data and the real data, the difficulty of practical application will be enhanced. Few-shot object detection (FSOD) based on transfer learning, especially fine-tuning, has attracted more attention. The related methods refer to the re-use of network weights pre-trained on a baseline dataset to improve generalization capabilities on a new domain with limited data. This type of FSOD usually involves novel categories from the target domain. At present, researchers mainly focus on the selection of pre-trained models to speed up fine-tuning.

Secondly, poor visibility of industrial defects refers to low contrast between defects and surrounding background, insufficient significance and small difference between positive and negative samples. In addition, irregular defect means that the same industrial product may have different kinds of defects, and the same kind of defects may also have diversity characteristics, i.e., shape, size and color. It is necessary to study the network capable of detecting various types of defects to avoid missing detection (Akhyar et al. Citation2023; Cha et al. Citation2018). For example, Tao et al. provide a cascaded autoencoder for segmenting and localizing defects, which can adapt to various conditions (Tao et al. Citation2018). Similarly, the segmentation is applied to avoid missing detection (Tsai, Fan, and Chou Citation2021; Yang et al. Citation2022). Furthermore, to solve the poor visibility and irregular problems, the most classical solving idea is multi-scale feature fusion. Among them, Feature Pyramid Networks (FPN) (Lin et al. Citation2017) has the excellent performance on multi-scale problem, which is often introduced into the classical object detection network. In defect target detection, Li et al. (Citation2018). detect steel surface defects based on the single-stage detector, i.e., YOLO, and integrate shallow features to improve the detection ability of small defects. Mei, Yang, and Yin (Citation2018) use the image pyramid structure to scale the image to three different sizes, which can add noise randomly. Then, to recovery and defect detection, autoencoders are introduced. The above methods improve the detection effect of industrial defects. In the meantime, it is verified that the multi-scale feature fusion can improve the detection ability to industrial defects, consisting in characteristics of poor visibility and irregular shape.

Aiming at the problems of sample shortage, poor visibility and irregularity in industrial defect detection, this paper proposes a multi-scale attention-based FSOD model. Here, decoupled Faster R-CNN (DeFRCN) (Qiao et al. Citation2021) is the fundamental of the proposed method, whose parameters are pretrained by a large amount of data. Then, the small sample defect data are used to fine-tuning model parameters. On the basis of the classical DeFRCN, FPN is introduced in the backbone part to enhance the learning ability of different scales and feature associations of industrial defects. Moreover, to improve the adaptability of multi-scale feature, we use residual attention module (RAM) (Fei et al. Citation2017) to obtain the integration of attention maps in each channel and original feature maps. In addition, the R-CNN head, box classifier and box regression parts are connected in two-level parallel. Then, the relative relationship between feature channels is obtained through the outer product calculation to enhance the feature representation. The proposed method can adapt to actual defect, so the application in industrial field is promoted.

Related Works

FSOD Methods

In recent years, image object detection technology based on deep learning has made remarkable achievements. Here, many mature detection models have emerged. However, these models all need to use a large number of labeled samples for training. In actual industrial defect detection, it is difficult to obtain high-quality labeled defect samples. Therefore, it is limited that the application and popularization of these methods in actual industrial defect detection (Luo et al. Citation2020; Wen et al. Citation2022). On this basis, FSOD methods have been researched and developed gradually because of its small dependence on samples. According to the idea and model structure of FSOD, existing methods can be divided into metric learning(Hsieh et al. Citation2019; Lin et al. Citation2020; Zhang et al. Citation2019), data augmentation (Gong et al. Citation2020; Lin et al. Citation2020), model structure (Chen, Jiang, and Zhao Citation2020; Fan et al. Citation2020), fine-tuning (Kang et al. Citation2018; Sun et al. Citation2021; Wang et al. Citation2020), meta-learning (Zhang et al. Citation2021; Zhu et al. Citation2020), ensemble (Fan et al. Citation2021; Han et al. Citation2021), etc. Among them, the method based on metric learning is easy to implement. However, the positioning accuracy is relatively poor. In data augmentation part, if the difference is large between the augmentation samples and the real samples, practical application will be hindered. The methods based on model structure provide effective auxiliary information by constructing new model structure on the basis of conventional model detection. This kind of method needs more theoretical and empirical support and is not easy to implement. Moreover, meta-learning based methods are trained iteratively by task, and only a small number of iterations are needed for a specific task to achieve better performance. However, the iterative process of this method is prone to non-convergence. The method based on fine-tuning is to use a large amount of data to pre-train the model, and then use a small number of new samples to fine-tune some parameters. The detection accuracy of this method is relatively high, but it is sensitive to hyperparameters. The ensemble methods integrate multiple methods, which can draw on the advantages of these methods. However, its structure is relatively complex, causing large calculation and huge hardware requirements. Obviously, different types of methods have different characteristics. This paper focuses on the industrial defect detection task, which needs to obtain higher detection accuracy because of the high cost of missing detection. Therefore, this paper focuses on the FSOD methods based on fine-tuning.

Recently, fine-tuning based FSOD methods mostly use Convolutional Neural Network (CNN) to pre-train ImageNet classification. Then, the whole target detector is trained on the base class. Finally, fine-tuning is carried out on the novel class. Here, Wang et al (Wang et al. Citation2020). propose a fine-tuning based FSOD method whose backbone is Faster R-CNN. In fine-tuning stage, the parameters in feature extraction are frozen, but adjusted in classifier and regression. In addition, Wu et al (Wu et al. Citation2020). propose to use FPN as the main network of Faster R-CNN to solve the problem of unbalanced scale distribution. On this basis, each target could be obtained and transformed to different scales for subsequent PRN and detection processing. In addition, Qiao et al. (Citation2021). extend Faster R-CNN by introducing a gradient decoupling layer for multilevel decoupling and a prototype calibration block for multitask decoupling, i.e., DeFRCN. The former redefines the feature forward operation and gradient backward operation, which decouples the back layer from the front layer. The latter one is a classification model based on offline prototype, which takes the proposal as input and boosts the original classification score and an additional pairwise score for calibration. The overall structure of DeFRCN is shown in .

Figure 1. The overall architecture of DeFRCN model(Qiao et al. Citation2021).

Multi-Scale Object Detection Method

In view of the irregular characteristic of industrial defects, it is necessary to design a detection model which is suitable for defects of different sizes. Studies have shown that, in the general CNN target detection model, the shallow network can extract spatial information such as the edge of the target. With the deepening of the network, features are gradually abstracted and spatial information is constantly lost (Zeiler and Fergus Citation2014). In addition, object detection includes two subtasks: object classification and location. When objects with large size and rich details, it is necessary to provide stronger information as the basis for classification. When objects with small size and low tolerance of deviation, more fine-grained spatial information is needed to achieve for accurate localization. At present, the most common methods for constructing multi-scale feature expression include: (1) input image pyramid to extract features of different scales (Hao et al. Citation2017; Lu, Javidi, and Lazebnik Citation2015); (2) design parallel branches inside the network to build a spatial pyramid (Lazebnik, Schmid, and Ponce Citation2019); (3) integrate feature layers of different depths inside the network to construct feature pyramid (Cai et al. Citation2016; Liu et al. Citation2015), etc.

Firstly, the multi-scale target detection method based on image pyramid randomly inputs images with different scales in the training stage, which can force the neural network to adapt to the target detection of different scales. In the test stage, the same image is detected several times at different scales, and the results are finally integrated by the non-maximum suppression (NMS). However, this kind of method does not only require small batch size in training stage, which affects the accuracy of the model, but also hinder practical application. In addition, parallel branches with different parameters are designed in the network. Here, each branch extracts features in different spatial levels based on its own receptive field, which can also build a spatial pyramid. For example, the SPP module in SPP-NET (Kaiming et al. Citation2015) adopts the multi-scale block method of SPM to pool each block, so that feature maps of arbitrary size can be converted into feature vectors in fixed length. Finally, integration of feature layers in different depths can be realized by feature map pyramid. This type of method consists of feature fusion of different depths based on cross-layer connections and spatial pyramid construction based on parallel branches with different receptive fields. Among them, FPN is a typical representative of the method based on cross-layer connection (Lin et al. Citation2017). The basic idea is to detect multi-scale targets by combining the fine-grained spatial information of the shallow feature map and the information of the deep feature maps. On the basis of RPN, this algorithm adds a channel from top to bottom. Starting from the deepest feature map, after 1 × 1 convolution and upsampling processing, the current feature maps are aligned with the shallow feature map. Then, the new feature maps are fused by adding symmetric elements. In the same way, the fusion with the shallow feature map is realized.

Attention Mechanism

Due to its superior effect and convenience of plug and play, attention mechanism has been more and more widely used in deep learning tasks. It can be simply divided into two categories: single-path attention and multi-path attention. Single-channel attention includes Squeeze-and-excitation network (SE-NET), Efficient Channel Attention (ECA-NET), etc. Among them, SE-NET (Jie et al. Citation2017), which is provided in CVPR2018, uses the idea of attention mechanism. Firstly, interdependence between feature maps is modeled. Then, the importance of each feature map can be adaptively obtained by learning. Finally, the parameters update is based on the importance. Furthermore, ECA-NET (Wang et al. Citation2020) is an improvement of SE-NET. In this model, more accurate attention information is obtained by the local cross-channel interaction strategy without dimensionality reduction, the adaptive selection method of the size of one-dimensional convolution kernels and one-dimensional convolutional processing. In other words, attention information is obtained by summarizing the cross-channel information. In terms of multi-path attention, ResNeSt (Li et al. Citation2017) is an improvement of ResNet, including split-attention block. On the basis of ResNet, the raw images are input into k Cardinal branches in ResNeSt. Here, the outputs carry the weight information of different channels. In addition, Convolutional Block Attention Module Network (CBMA) (Woo et al. Citation2018) builds two submodules, i.e., Spatial Attention Module (SAM) and Channel Attention Module (CAM). The attention information in spatial and channel are integrated by them, respectively. On this basis, the Residual Attention Module (RAM) (Fei et al. Citation2017) is embedded with the depth separable convolution to obtain the attention map according to the features of each channel. Then, the final output is obtained by adding channel attentional map and spatial attentional map, sigmoid processing and multiplying with original features. This method has better adaptability to super-resolution problems.

The Proposed Method

The Architecture of the DeFRCN-MAM Model

The overall structure of the DeFRCN-MAM is shown in . The fundamental of the proposed one is DeFRCN. The main improvement is to introduce FPN and RAM on backbone to improve the learning ability of multi-scale features, and obtain the association information of different channels and spatial feature maps. In addition, the R-CNN head, box classifier and box regression parts are connected in two-level parallel. Then, the relative relationship between feature channels is obtained through the outer product calculation to enhance the feature representation. Finally, the output of the network includes two parts: classification and location.

Figure 2. The architecture of DeFRCN-MAM model.

The Architecture of the DeFRCN

Different from standard Faster R-CNN, DeFRCN involves two gradient decoupling layers (GDL) and an offline prototype calibration block (PCB). Two GDLs are between backbone and RPN and between backbone and R-CNN, respectively, whose function is to adjust the degree of decoupling between the different modules. During forward and backward propagation, GDL performs a learnable affine transformation, $A$ , to the forward feature maps, and simply multiplies the backward gradient by a constant. The effectively decoupling can be carried out. In addition, during the backward propagation, GDL takes the gradient from the subsequent layer, multiplies it by a constant $λ \in [0, 1]$ and passes it to the preceding layer. Mathematically, we can formally treat GDL as a pseudo-function $G_{(A λ)}$ defined by two equations describing its forward- and backward-propagation behavior as follows:

(1)

G_{(A, λ)} (x) = A (x)

(1)

(2)

\frac{d G_{(A, λ)}}{dx} = λ \nabla_{A}

(2)

where $\nabla_{A}$ is the Jacobian matrix form the affine layer.

In addition, paralleled with box classifier module, PCB is used to further calibrate classifier score. Specifically, the PCB is equipped with a well-trained classification model and a new supporting prototype. Taken region proposals obtained by few-show object detection as input, PCB uses an additional prototype-based pairwise fraction to improve the original logistic regression score. Furthermore, since the PCB module is offline without any further training, it can be plug-and-play and easily equipped to any other architectures to build stronger few-shot detectors.

Multi-Scale Attention Mechanism

The backbone structure of the DeFRCN-MAM model is shown in . To improve the expression of industrial defect target features, FPN and RAM are introduces into the classic ResNet-101 model. This improvement can enhance the learning ability of multi-scale features and obtain the correlation information of different channel and spatial feature maps.

Figure 3. The architecture of backbone.

In the task of FSOD, the various domains have different feature distribution and different feature representation. Hence, it is necessary to introduce RAM in backbone to improve feature representation by stacking. The architecture of RAM is shown in .

Figure 4. The architecture of RAM.

RAM consists of residual unit, convolutional layer, pooling and upsampling processing, etc. Here, pooling and upsampling processing is used to extract multi-level semantic features. Residual unit can fuse multi-level semantic features and provide wide information to classification. In concretely, f(x) is input feature map of RAM. After three maxpooling and upsampling processing, the different level semantic features of $f (x)$ can be obtained. Then, attention parameters $M (x)$ can be extracted. Here, residual unit processing is carried out between pooling and upsampling. The dimensional of feature maps cannot be changed by residual unit, but the feature information is augmented. $f^{'} (x)$ can be obtained by several processing. Then, after feature information enhancement, the output feature map, $f^{''} (x)$ , can be obtained as:

(3)

f^{′′} (x) = (1 + M (x)) f^{'} (x)

(3)

Experiments and Discussions

Industrial Defect Datasets

At present, the open datasets of industrial defect objects can be divided by the tasks, i.e., supervised classification, supervised detection, supervised segmentation, unsupervised segmentation, and weakly supervised segmentation. In this paper, DAGM 2007 and NEU-DET are taken as the research objects. Here, DAGM 2007 dataset is provided by Competition at DAGM 2007 Symposium, which contains 10 types of defect objects and is published for weakly supervised segmentation task. Its partial data is shown in the . Among them, 6 classes with 150 defect images are in each category. The other 4 classes have 300 defect images each category. The NEU-DET (Song and Yan Citation2013) dataset is The Northeastern University Surface Defect (NEU-DET) database, which contains six typical defects of hot-rolled steel strips, i.e., crazing, inclusion, patches, pitted surface, rolled in scale and scratches. This dataset is published for supervised detection task. It is shown in . It includes 1800 grayscale images with six different types of typical defects, each type containing 300 samples. NEU-DET provides annotations in “xml” format that indicate the category and location of defects in each image. In , a yellow box indicates the location of each defect.

Figure 5. Open datasets, (a) DAGM 2007, (b) NEU-DET.

The experimental task in this paper are image samples of the flange shaft of automobile parts with defects. According to the detected defects of automobile parts, our team establishes the corresponding data acquisition environment. The collection environment is shown in . Samples are collected by five cameras. Hereinto, there are four cameras around parts and one camera directly above it. Then, the data can be directly transmitted to the top computer through the data line. In view of this, we can establish a new part defects dataset.

Figure 6. Part image data acquisition environment.

In our new part defects dataset, there are 8-types defects, i.e., gear, quench, contusion, bump, crack, threaded hole shielding, unthreaded hole and threaded hole. The examples are shown in .

Figure 7. Partial samples in defects dataset established by our team.

LabelImg software is used for image labeling of the collected samples. The labeling results are saved as files in “xml” format. It includes coordinate of defects, defects type, image size, image name, etc. The labeling images are about 1600. It is a typical few-shot object detection task, so we need to study FSOD methods.

Experimental Setting

Following the previous work of the researchers, we evaluate the performance of the proposed method by segmentation data in DAGM 2007, NEU-DET, flange shaft dataset established by our team. The data situations are shown in . The 5421 images in the three datasets are divided into training and test sets in a 1:1 ratio. Then, 16-types data in DAGM 2007 and NEU-DET is base class and data in flange shaft dataset is novel class. Furthermore, we set shots, k = 1, 2, 3, 5 and 10, in few-shot object detection. The assessment metrics are novel mean average precision (nAP), nAP50, nAP75, etc. In addition, the basic network is Faster R-CNN and the backbone is pre-trained ResNet-101 by ImageNet data.

Table 2. Defects data situation.

Download CSV Display Table

The experimental settings are shown in . We apply Stochastic Gradient Descent (SGD) to train the network optimization parameters.

Table 3. Training setting.

Download CSV Display Table

In addition, the experiment environment is combination of Intel Xeon Silver 4210 2.20 GHz × 20 of CPU and NVIDIA GTX 3090Ti 24GB × 2 of GPU. The implementation system is Ubuntu 18.04.6 LTS. In addition, we use Pytorch 1.7.1, cuda 11.0, cudnn 8.0.5 to establish the deep learning architecture. The training and the fine tuning situations are shown in .

Table 4. Training and fine tuning situations.

Download CSV Display Table

The experimental procedure is as follows:

On the basis of pre-trained ResNet-101-FPN trained by ImageNet, 16-types defect samples of DAGM 2007 and NU-DET are used to pre-train it again.
Fine tuning is performed based on the pre-divided k-shot samples.
Each fine-tuned model is evaluated by test set.

Ablation Study

According to the architecture of DeFRCN-MAM, we can find that the main contribution is the multi-scale feature fusion processing in backbone, i.e., FPN, and the parallel connection of R-CNN head, box classifier and box regression parts. In this part, we do ablation study and show the function of these contributions. The comparison results are shown in by average precision (nAP), nAP 50 and nAP75. Here, the compared networks include classical DeFRCN, FPN revised DeFRCN, DeFRCN added parallel connection and DeFRCN with FPN and parallel connection revision.

Table 5. Ablation comparison results.

Download CSV Display Table

According to , we can find that with the assistance of FPN and parallel connection, the nAP, nAP 50 and nAP75 of detection model have superiority in the most conditions. Moreover, the results of FPN revised DeFRCN are the most close to the optimal ones. Hence, it is obviously effective to improve detection results that the multi-scale feature fusion based on FPN. For example, nAP75s of FPN revised DeFRCN are 7.97% and 3.34% less than the optimal ones in 5-shot and 10-shot conditions, but higher than other ones. In short, it is verified that the contribution of multi-scale feature fusion and parallel connection to defect detection by this ablation study.

Detection Results and Analysis

The training process of DeFRCN-MAM in base class samples, which are in DAGM 2007 and NEU-DET datasets, is shown in . It is obviously that the loss value of box regression and classification drop firstly sharply and then smooth descent. This phenomenon shows that the training process is normal. Moreover, the learning rate is dynamic. When loss value is relatively big, the learning rate is big too. On the contrary, when the loss value oscillates, the learning rate is small, which can drive parameters learning.

Figure 8. Loss and learning rate of training process based on base class samples.

On the basis of the model trained by base class samples, the fine tuning of model is carried out by samples in novel class. The defects in the novel class dataset are detected by the new model. This part of training process can be shown in . We can find that the loss values of box regression and classification drop sharply in the front 1500 epochs and keep stable after that. It illustrates that the proposed model can fleetly transfer to the new dataset samples, i.e. flange shaft dataset. Compared with training process of samples in base class dataset, the fine-tuning model of samples in novel class dataset can be completed in a handful of epochs.

Figure 9. Loss and learning rate of training process based on novel class samples.

The detection results comparison between exposed methods and DeFRCN-MAM model is carried out in this part. Here, the exposed methods include FRCN+ft-full, TFA w/fc, TFA w/cos, few-shot object detection via contrastive proposal encoding (FSCE) and DeFRCN. In different sample quantity conditions, i.e. 1-shot, 2-shot, 3-shot, 5-shot and 10-shot, the detection result are compared, which is shown in . Here, the assessment metric is the mean average precision (mAP).

Table 6. Detection mAP (%) comparison between exposed methods and DeFRCN-MAM.

Download CSV Display Table

According to the results of , it can be seen that the proposed DeFRCN-MAM can obtain the optimal detection results in a variety of defect detection, even if the number of samples is different. In especially, threaded hole shielding defects obtains the optimal mAP value in 1-shot, 2-shot, 3-shot, 5-shot and 10-shot cases, which is higher than the existing methods by 2.56%, 5.08%, 2.64%, 0.85% and 2.85% at least. The advantage of the proposed method is obvious. In addition, for quench, contusion and bump defects, the proposed method has better detection results in most cases. However, for gear, crack and unthreaded hole, the detection effect of the proposed method is inferior to the existing classical methods. From the characteristics of different types of defects, it can be seen that the detection performance of the proposed method is significantly improved for industrial defects with small size and poor clarity. Although the detection effect of the proposed method is not the strongest for defects with relatively strong visibility, such as gear and crack, it is still close to the existing optimal detection results. This experiment verifies that DeFRCN-MAM has excellent detection capability for irregular defect targets.

In order to further prove the detection advantages of DeFRCN-MAM method, it are statistically analyzed that the comprehensive detection effects of different detection methods in the case of multiple data samples. Moreover, the evaluation indexes are nAP, nAP50 and nAP75, respectively. The comparison result is shown in .

Table 7. Detection result comparison based on several metrics.

Download CSV Display Table

The results of show that the optimal detection results of DeFRCN-MAM are guaranteed even if with different sample sizes, especially under nAP and nAP75 indexes. Under nAP50 index, the detection results of DeFRCN-MAM are also close to the existing optimal detection results. The experimental results further verify that the proposed method has strong industrial defect detection capability.

In addition, shows partial defect detection results in the case of 10-shot. The results can visually show the detection performance of DeFRCN-MAM. Among them, the accuracy of unthreaded hole, threaded hole and quench are above 0.95, which further demonstrate the adaptability of DeFRCN-MAM to industrial defects.

Figure 10. Some detection results of DeFRCN-MAM.

Conclusion

The proposed DeFRCN-MAM is proposed in this method, facing the problems of small samples, weak visibility and irregular shape of industrial defect objects. Here, DeFRCN is the fundamental of the proposed one. Here, the core contribution of DeFRCN, i.e., GCL and PCB, adapts to few-shot detection problem. Moreover, we introduce FPN and RAM in the backbone, to enhance the multi-scale feature expression of the model and the learning of the feature correlation. So that, the proposed model can better adapt to the characteristics of weak visualization and irregular shape of industrial defects. In addition, the feature expression is enhanced by connecting R-CNN head, box classifier and box regression in parallel. In experiments part, we pre-train the proposed model with the public industrial defect datasets, i.e., DAGM 2007 and NEU-DET, as the base class. Then, we take the flange shaft datasets measured in the laboratory as the new class to fine-tune the model and conduct testing of industrial defects. The rationality of DeFRCN-MAM structure is verified by ablation study part. Moreover, the results of DeFRCN-MAM are compared with several classical FSOD methods, including FRCN, TFA, FSCE and DeFRCN. The results show that DeFRCN-MAM has the strongest ability to detect industrial defects with poor visibility and small size, even if the corresponding data volume is different. In especially for threaded hole shielding, the optimal mAP values are obtained under 1-shot, 2-shot, 3-shot, 5-shot and 10-shot, and at least 2.56%, 5.08%, 2.64%, 0.85% and 2.85% higher than other methods, with obvious advantages. However, when detecting defects with strong visibility, the proposed method is still close to the existing optimal detection results. Furthermore, the DeFRCN-MAM, proposed in this paper, has stronger adaptability to industrial defects and better detection effect. This study also provides the design idea of FSOD method for the subsequent industrial defect detection based on deep learning.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The work was supported by Beijing Natural Science Foundation [4202015]; and Chinese University Industry-University-Research Innovation Fund - Blue Dot Distributed Intelligent Computing Project [2021LDA06002].

References

Akhyar, F., Y. Liu, C. Hsu, T. K. Shih, and C. Lin. 2023. FDD: A deep learning–based steel defect detectors. The International Journal of Advanced Manufacturing Technology 126 (3–4):1093–22. doi:10.1007/s00170-023-11087-9
PubMed Web of Science ®Google Scholar
Apostolopoulos, I. D., and M. A. Tzani. 2023. Industrial object and defect recognition ulizing multilevel feature extraction from industrial scenes with deep learning approach. Journal of Ambient Intelligence and Humanized Computing 14 (8):10263–76. doi:10.1007/s12652-021-03688-7
Google Scholar
Cai, Z., Q. Fan, R. S. Feris, and N. Vasconcelos. 2016. A unified multi-scale deep convolutional neural network for fast object detection. Springer, Cham.
Google Scholar
Cha, Y. J., W. Choi, G. Suh, S. Mahmoudkhani, and O. Buyukozturk. 2018. Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types. Computer-Aided Civil and Infrastructure Engineering 33 (9):731–47. doi:10.1111/mice.12334
Web of Science ®Google Scholar
Chen, X., M. Jiang, and Q. Zhao. 2020. Leveraging bottom-up and top-down attention for few-shot object detection. arXiv:2007.12104.
Google Scholar
Fan, Z., Y. Ma, Z. Li, and J. Sun. 2021. Generalized few-shot object detection without forgetting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 4527–36.
Google Scholar
Fan, Q., W. Zhuo, C. K. Tang, and Y. W. Tai. 2020. Few-shot object detection with attention-RPN and multi-relation detector. Paper read at IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.
Google Scholar
Fei, Y., C. Huang, J. Cao, M. Li, and C. Lu. 2020. Attribute restoration framework for anomaly detection. IEEE Transactions on Multimedia 24 (99):1–1.
Google Scholar
Fei, W., M. Jiang, Q. Chen, S. Yang, and X. Tang. 2017. Residual attention network for image classification. Paper read at IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.
Google Scholar
Gao, Y., L. Gao, X. Li, and X. Yan. 2020. A semi-supervised convolutional neural network-based method for steel surface defect recognition. Robotics and Computer Integrated Manufacturing 61 (Feb.):.101825.101821–28. doi:10.1016/j.rcim.2019.101825
Google Scholar
Gong, Y., H. Shao, J. Luo, and Z. Li. 2020. A deep transfer learning model for inclusion defect detection of aeronautics composite materials. Composite Structures 252:112681. doi:10.1016/j.compstruct.2020.112681
Web of Science ®Google Scholar
Han, G., S. Huang, J. Ma, Y. He, and S. F. Chang. 2021. Meta faster R-CNN: Towards accurate few-shot object detection with attentive feature alignment. Paper read at The Thirty-Sixth AAAI Conference on Artificial Intelligence, Online.
Google Scholar
Hao, Z., Y. Liu, H. Qin, J. Yan, X. Li, and X. Hu. 2017. Scale-aware face detection. Paper read at Computer Vision & Pattern Recognition, Hawaii Honolulu, USA.
Google Scholar
He, Y., K. Song, H. Dong, and Y. Yan. 2019. Semi-supervised defect classification of steel surface based on multi-training and generative adversarial network. Optics and Lasers in Engineering 122:294–302. doi:10.1016/j.optlaseng.2019.06.020
Web of Science ®Google Scholar
Hsieh, T. I., Y. C. Lo, H. T. Chen, and T. L. Liu. 2019. One-shot object detection with co-attention and co-excitation. Conference on Neural Information Processing Systems, Jaipur, India.
Google Scholar
Imoto, K., T. Nakai, T. Ike, K. Haruki, and Y. Sato. 2019. A CNN-based transfer learning method for defect classification in semiconductor manufacturing[J]. IEEE Transactions on Semiconductor Manufacturing 32 (4):455–459. doi:10.1109/TSM.2019.2941752
Web of Science ®Google Scholar
Jie, H., S. Li, S. Gang, and S. Albanie. 2017. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis & Machine Intelligence 42 (8).
Google Scholar
Jing, J. F., H. Ma, and H. H. Zhang. 2019. Automatic fabric defect detection using a deep convolutional neural network. Coloration Technology 135 (3):213–23. doi:10.1111/cote.12394
Web of Science ®Google Scholar
Kaiming, H., Z. Xiangyu, R. Shaoqing, and S. Jian. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (9):1904–16. doi:10.1109/TPAMI.2015.2389824
PubMed Web of Science ®Google Scholar
Kang, B., L. Zhuang, W. Xin, F. Yu, and T. Darrell. 2018. Few-shot object detection via feature reweighting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 8420–29.
Google Scholar
Lazebnik, S., C. Schmid, and J. Ponce. 2019. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Conference on Computer Vision & Pattern Recognition, New York, NY, USA, 2169–78.
Google Scholar
Li, X., L. Ding, W. Li, and C. Fang. 2017. FPGA accelerates deep residual learning for image recognition. Paper read at IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference, Chengdu, China.
Google Scholar
Lin, D., Y. Cao, W. Zhu, and Y. Li. 2020. Few-shot defect segmentation leveraging abundant normal training samples through normal background regularization and crop-and-paste operation. arXiv:2007.09438.
Google Scholar
Lin, T. Y., P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie. 2017. Feature pyramid networks for object detection. Paper read at IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.
Google Scholar
Li, J., Z. Su, J. Geng, and Y. Yin. 2018. Real-time detection of steel strip surface defects based on improved YOLO detection network. IFAC-Papersonline 51 (21):76–81. doi:10.1016/j.ifacol.2018.09.412
Google Scholar
Liu, W., D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg. 2015. SSD: Single shot multibox detector. European Conference on Computer Vision, Zurich, Switzerlan, 21–37.
Google Scholar
Lu, Y., T. Javidi, and S. Lazebnik. 2015. Adaptive object detection using adjacency and zoom prediction. IEEE Computer Society, Las Vegas, NV, USA.
Google Scholar
Luo, Q., X. Fang, L. Liu, C. Yang, and Y. Sun. 2020. Automated visual defect detection for flat steel surface: A survey. IEEE Transactions on Instrumentation and Measurement 69 (3):626–44. doi:10.1109/TIM.2019.2963555
Web of Science ®Google Scholar
Mei, S., H. Yang, and Z. Yin. 2018. An unsupervised-learning-based approach for automated defect inspection on textured surfaces[J]. Fortschritte der Physik 67 (6):1266–77.
Google Scholar
Niu, S., H. Lin, T. Niu, B. Li, and X. Wang. 2019. DefectGAN: Weakly-supervised defect detection using generative adversarial network. Paper read at IEEE 15th International Conference on Automation Science and Engineering, Vancouver, BC, Canada.
Google Scholar
Qiao, L., Y. Zhao, Z. Li, X. Qiu, and C. Zhang. 2021. DeFRCN: Decoupled faster R-CNN for few-shot object detection. Paper read at Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada.
Google Scholar
Ren, Z., F. Fang, N. Yan, and Y. Wu. 2021. State of the art in defect detection based on machine vision. International Journal of Precision Engineering and Manufacturing-Green Technology 9 (2):9–12. doi:10.1007/s40684-021-00343-6
Web of Science ®Google Scholar
Ristea, N. C., N. Madan, R. T. Ionescu, K. Nasrollahi, F. S. Khan, T. B. Moeslund, and M. Shah. 2021. Self-supervised predictive convolutional attentive block for anomaly detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.
Google Scholar
Saberironaghi, A., J. Ren, and M. El-Gindy. 2023. Defect detection methods for industrial prodocucts using deep learning techniques: A revies. Algorithms 16 (2):95. doi:10.3390/a16020095
Web of Science ®Google Scholar
Song, K., and Y. Yan. 2013. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Applied Surface Science 285 (21):858–64. doi:10.1016/j.apsusc.2013.09.002
Google Scholar
Sun, B., B. Li, S. Cai, Y. Yuan, and C. Zhang. 2021. FSCE: Few-shot object detection via contrastive proposal encoding. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.
Google Scholar
Tang, B., L. Chen, W. Sun, and Z. Lin. 2022. Review of surface defect detection of steel products based on machin vision. IET Image Processing 17 (2):303–22. doi:10.1049/ipr2.12647
Web of Science ®Google Scholar
Tang, T. W., W. H. Kuo, J. H. Lan, C. F. Ding, H. Hsu, and H. T. J. S. Young. 2020. Anomaly detection neural network with dual auto-encoders GAN and its industrial inspection applications. Sensors 20 (12):3336. doi:10.3390/s20123336
PubMed Web of Science ®Google Scholar
Tao, X., Z. Wang, Z. Zhang, D. Zhang, D. Xu, X. Gong, and L. Zhang. 2018. Wire defect recognition of spring-wire socket using multitask convolutional neural networks. IEEE Transactions on Components, Packaging, and Manufacturing Technology 8 (4):689–98. doi:10.1109/TCPMT.2018.2794540
Web of Science ®Google Scholar
Tao, X., D. Zhang, W. Ma, X. Liu, and D. Xu. 2018. Automatic metallic surface defect detection and recognition with convolutional neural networks. Applied Sciences 8 (9):1575. doi:10.3390/app8091575
Google Scholar
Tsai, D.-M., S.-K. S. Fan, and Y.-H. Chou. 2021. Auto-annotated deep segmentation for surface defect detection. IEEE Transactions on Instrumentation and Measurement 70:5011410. doi:10.1109/TIM.2021.3087826
Web of Science ®Google Scholar
Wang, X., T. E. Huang, T. Darrell, J. E. Gonzalez, and F. Yu. 2020. Frustratingly simple few-shot object detection. arXiv: 2003.06957.
Google Scholar
Wang, Q., B. Wu, P. Zhu, P. Li, and Q. Hu. 2020. ECA-Net: Efficient channel attention for deep convolutional neural networks. Paper read at IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.
Google Scholar
Wen, X., J. Shan, Y. He, and K. Song. 2022. Steel surface defect recognition: A survey. Coatings 13 (1):17. doi:10.3390/coatings13010017
Web of Science ®Google Scholar
Woo, S., J. Park, J. Y. Lee, and I. S. Kweon. 2018. CBAM: Convolutional block attention module. Paper read at European Conference on Computer Vision, Munich, Germany.
Google Scholar
Wu, J., S. Liu, D. Huang, and Y. Wang. 2020. Multi-scale positive sample refinement for few-shot object detection. European Conference on Computer Vision, Online, 456–72.
Google Scholar
Xie, Y., W. Hu, S. Xie, and L. He. 2023. Surface defect detection algorithm based on feature-enhanced YOLO. Cognitive Computation 15 (2):565–79. doi:10.1007/s12559-022-10061-z
Web of Science ®Google Scholar
Yang, L., J. Fan, B. Huo, E. Li, and Y. Liu. 2022. A nondestructive automatic defect detection method with pixelwise segmentation. Knowledge-Based Systems 242:108338. doi:10.1016/j.knosys.2022.108338
Web of Science ®Google Scholar
Zeiler, M. D., and R. Fergus. 2014. Visualizing and understanding convolutional networks. European Conference on Computer Vision, Zurich, Switzerlan.
Google Scholar
Zhang, G., Z. Luo, K. Cui, and S. Lu. 2021. Meta-DETR: Few-shot object detection via unified image-level meta-learning. arXiv: 2103.11731.
Google Scholar
Zhang, H., R. Pan, F. Chang, L. He, Z. Dong, and J. Yang. 2023. Zero-DD: Zero-sample defect detection for industrial products. Computer and Electrical Engineering 105 (108516):108516. doi:10.1016/j.compeleceng.2022.108516
Google Scholar
Zhang, T., Y. Zhang, X. Sun, H. Sun, M. Yan, X. Yang, and K. Fu. 2019. Comparison network for one-shot conditional object detection. Computer Vision & Pattern Recognition, Los Angeles, USA.
Google Scholar
Zhou, K., Y. Xiao, J. Yang, J. Cheng, W. Liu, W. Luo, Z. Gu, J. Liu, and S. Gao. 2020. Encoding structure-texture relation with P-Net for anomaly detection in retinal images. European Conference on Computer Vision, Online, 360–77.
Google Scholar
Zhu, J.-Y., T. Park, P. Isola, and A. E. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2223–32.
Google Scholar
Zhu, X., W. Su, L. Lu, B. Li, and J. Dai. 2020. Deformable DETR: Deformable transformers for end-to-end object detection. arXiv: 2010.04159.
Google Scholar

DeFRCN-MAM: DeFRCN and multi-scale attention mechanism-based industrial defect detection method

ABSTRACT

Introduction

Table 1. The main problems and related studies for industrial defect detection task.