911
Views
0
CrossRef citations to date
0
Altmetric
Research Article

YOLOSeaShip: a lightweight model for real-time ship detection

, &
Article: 2307613 | Received 15 Sep 2023, Accepted 15 Jan 2024, Published online: 31 Jan 2024

ABSTRACT

With the rapid advancements in computer vision, ship detection models based on deep learning have been more and more prevalent. However, most network methods use expensive costs with high hardware equipment needed to increase detection accuracy. In response to this challenge, a lightweight real-time detection approach called YOLOSeaShip is proposed. Firstly, derived from the YOLOv7-tiny model, the partial convolution was utilized to replace the original 3×1 convolution in the ELAN module to further fewer parameters and improve the operation speed. Secondly, the parameter-free average attention module was integrated to improve the locating capacity for the hull of a ship in an image. Finally, the accuracy changes of the Focal EIoU hybrid loss function under different parameter changes were studied. The practical results trained on the SeaShips (7000) dataset demonstrate that the suggested method can detect and classify the ship position from the image more efficiently, with mAP of 0.976 and FPS of 119.84, which is ideal for real-time ship detection applications.

Introduction

Ship detection is essential for the development of intelligent marine traffic and the real-time collection of visual maritime traffic information (Zhen et al., Citation2023). However, due to the complexity and variability of the marine environment, traditional target detection methods are prone to missing and misleading detection, which prevents them from being able to meet the demands of ship safety navigation (Escorcia-Gutierrez et al., Citation2022; Guo et al., Citation2023; Perera & Soares, Citation2015). Recently, due to the progress of computer vision technology, ship detection has been widely used in marine traffic monitoring, marine environmental protection, maritime sovereignty maintenance, naval war strategic deployment, and so on (T. Zhang & Zhang, Citation2019; Zhang et al., Citation2019; Zhou & Chen, Citation2021).

In general, there are two types of target detection algorithms based on deep learning. The first is the two-stage approach known as regions with convolutional neural network (R-CNN) (Girshick, Citation2015; He et al., Citation2017; Ren et al., Citation2017), but it is slow and cannot detect ships quickly. The other category includes one-stage models, such as single shot multi box detector (SSD) (Y. Jiang et al., Citation2019; Shi et al., Citation2021; X. Wang et al., Citation2018) and you only look once (YOLO) (J. Wu et al., Citation2023; Xu et al., Citation2022; Zhu et al., Citation2023), which are often more effective than two-stage object detection methods. At present, both of these algorithms have been applied to ship detection. Inspired by the R-CNN, Zhang et al. (Citation2019), Wen et al. (Citation2023), and Y. Chen et al. (Citation2021) presented an enhanced Faster-RCNN version for high-resolution remote sensing ship image detection. Firstly, the ROI feature is extracted from the feature map to avoid repeated feature calculation. Secondly, the SVM classifier is used to further improve the classification performance. In (Wen et al., Citation2023), Wen et al. built a new encoder-decoder framework based on SSD after an in-depth discussion of the single-shot detector with multi-scale sensing, thereby improving the detection accuracy of small ship targets and the resistance to scale variance. In addition, Y. Chen et al. (Citation2021) trained the YOLOv5 model on the dataset, demonstrating that this method has a very wide application prospect in ship detection.

The YOLO series models continue to apply the most recent research findings for continual iterative optimization due to the ongoing evolution of the computer vision field. In terms of the multi-object detection effect, YOLOv7 (C. Y. Wang et al., Citation2023), the most recent method in the YOLO series, outperforms many sophisticated object detection models. This model has been refined and enhanced by numerous academics to accommodate various detecting items. Subsequently, Patel et al (Patel et al., Citation2022). combined the advantages of a graph neural network and YOLOv7 framework for ocean monitoring in high-resolution satellite imagery. In 2023, Wu et al. (Wu et al., Citation2023) proposed an advanced YOLOv7 algorithm to replace the anchor box with one more suitable for ship positioning. Therefore, it is better to achieve the capture of ship characteristics at different scales. In the same year, Chen et al. (Chen et al., Citation2023) from Guangdong Ocean University presented the CSD-YOLO algorithm. The CSD-YOLO uses the extended techniques and adds the SAS-FPN module to draw attention to important information.

However, the majority of the detection models utilized in current research are large models with a lot of parameters and expensive hardware needs. To effectively execute ship recognition tasks, it is necessary to provide a lightweight model with minimal hardware performance requirements that may be more readily used on low-configuration computer machines. Some lightweight convolution methods have been proposed by researchers, but most of them are based on depth-wise convolution (Howard et al., Citation2017). Among them, the partial convolution module (PConv) (Chen et al., Citation2023), as the most recent lightweight convolution technique, has been used in this work owing to its quicker features. Additionally, the attention mechanism was introduced to image processing to increase the positioning ability of the model for searching for regions of interest (ROI). The parameter-free average attention module (PfAAM) (Körber, Citation2022), obtaining the analytical solution of the energy function that speeds up the computation of attention weights, was added to the revised model to avoid the further expansion of the number of parameters.

In order to address the aforementioned ship detection issues, this study developed a lightweight ship target recognition algorithm called YOLOSeaShip that is based on enhanced YOLOv7. YOLOSeaShip enhanced YOLOv7-tiny to address the issue that current ship detection models are frequently huge and demand expensive equipment. To achieve a lightweight model, YOLOSeaShip used PConv to replace the original basic convolution module. Then, PfAAM was introduced to reinforce the information about the region of interest to solve the problem of insufficient localization ability. Finally, the Focal and efficient IOU loss function (Y. F. Zhang et al., Citation2022) and the effects of different parameter adjustments on the model detection performance were investigated. In comparison to YOLOv7-tiny, YOLOShip can increase ship detection accuracy while still meeting detection speed requirements, offer a better framework for ensuing complex tasks like ship tracking and recognition, and meet the needs of intelligent border and coastal defense construction.

The remainder of the paper is structured as: The next part is the structure and important modules of our model. Then, the relevant experimental results and analysis were displayed in the third part and finally concluded the paper.

Method

Details of YOLOSeaShip

The YOLOSeaShip model consists of three parts: input, backbone, and head. Its structure is shown in . The input terminal consists of mosaic data enhancement, anchor frame calculation, and adaptive picture scaling (Yao et al., Citation2021). Similar to the CutMix principle, the Mosaic method randomly replaces a part of the image with the pixel value of the corresponding area of the other image. This method can not only increase the number of small target samples, but also improve the robustness and generalization ability of the network. By analyzing the size, shape, and distribution of the target in the data set, the K-means algorithm is used to perform cluster learning on the width and height of the target boundary box. To obtain a set of anchor box sizes suitable for the data set and complete the prediction of the position and size of the target boundary box, thus avoiding the errors caused by the manual design of the prior box size. Adaptive picture scaling is a scaling method based on object scale, which can adaptively scale the input image size to adapt to the detection of objects of different scales, and effectively solve the problem of scale inconsistency in the detection of objects. The backbone includes CBL, ELAN_tiny, MaxPool, and PfAAMLayer. CBL is the most basic convolution operation in our method, which follows the sequence of convolution, batch normalization, and Leakyrelu activation function. Besides, the PfAAMLayer is added at the end of each pooling layer. The ELAN_tiny module splits the input into two branches, one of which goes through the residual structure of multiple CBS, the other performs a series of convolutional concatenation operations, and finally concatenates the branches. MaxPool mainly performs data sampling for dimensionality reduction. Besides, PfAAMLayer is introduced after the pooling layer to obtain the characteristics of the region of interest. In the head part, SPPCSPC_tiny, CBL, ELAN_tiny, and upsampling components are used. SPPCSPC_tiny combines local features and global features to increase the feature capture capability of the method. SPPCSPC_tiny in YOLOvSeaship is sampled using 5 × 5, 9 × 9, and 13 × 13 Max pools. Upsampling enlarges and restores the feature map. The output of the model contains three feature vectors of different sizes for detecting objects of different sizes.

Figure 1. The overall structure of YOLOseaship.

Figure 1. The overall structure of YOLOseaship.

Partial convolution module

To extract spatial features, the Partial convolution (PConv) module simply applies a standard transformation to a portion of the input channels while leaving the remaining channels unaltered. The first or final continuous channel is representative of the complete characteristic map and is used for the calculation of continuous or regular memory access. The number of channels in the input and output characteristic maps is the same, but commonality is not lost. We introduced it into the ELAN structure intending to further alleviate the number of parameters in the model and improve operational efficiency, as displayed in .

Figure 2. Partial convolution module. (a) PConv; (b) ELAN_tiny.

Figure 2. Partial convolution module. (a) PConv; (b) ELAN_tiny.

Parameter-free average attention module

The overall architecture of the parameter-free average attention module (PfAAM) is depicted in . After inputting the characteristic map H×W×C into the network, PfAAM obtains the spatial attention part Asp by averaging the spatial features of the input along its channel. Based on spatial attention, the channel attention Ach is averaged along the spatial dimension of the characteristic graph. The resulting attention graph is extended and reassembled along its dimensionality reduction to obtain a description of the most important part of the attention in the characteristic input graph. The final reconstituted attention graph utilizes an S-gating mechanism to increase the representation of the input. It can be expressed as:

Figure 3. Parameter–free average attention module.

Figure 3. Parameter–free average attention module.

(1) Asp(xH×W)=1Ci=1CxH×W(i)(1)
(2) Ach(yC)=1H×Wi=1Hj=1WyC(i,j)(2)
(3) F =σ(AspAch)F(3)

where ⊗ is the element-wise multiplication, xH×W represents the average value of each spatial element, yC represents the average value along its spatial dimension, σ shows the sigmoid function, F  is the output of PfAAM, and F represents the input of a characteristic graph. As opposed to attention modules that highlight features by learning parameters, PfAAM is parameterless and emphasizes features through spatial and along-channel averaging exclusively.

Focal and efficient IOU loss

Efficient IOU (EIOU) loss divides the aspect ratio loss term into the difference between the expected height and width and the minimal height and width of the external frame, speeding up the convergence efficiency and improving the regression accuracy. Besides, the Focal loss was introduced into EIOU to address the sample imbalance in the bounding box regression task. When box regression is optimized, anchor boxes that have little overlap with the goal bounding box contribute less, allowing the regression process to concentrate more on high-quality anchor boxes. It is given by:

(4) LEIOU=LIOU+Ldis+Lasp=1IOU+ρ2(b,bgt)(wc)2+(hc)2+ρ2(w,wgt)(wc)2+ρ2(h,hgt)(hc)2(4)
(5) LFocalEIOU=IOUγLEIOU(5)

where, LIOU,Ldis,Lasp represent IOU, distance, and aspect loss, respectively. wc and hc show the width and height of the smallest bounding rectangle between the predicted bounding value and the true bounding value. ρ represents the Euclidean distance. The relevant geometric meaning is shown in .

Figure 4. Schematic of Focal EIOU loss.

Figure 4. Schematic of Focal EIOU loss.

Experimental results and discussion

All models are based on NVIDIA Quadro RTX 6000 GPU memory of 24 GB, CUDA 11.0.2, and PyTorch 1.8.0. The models were trained with the 16 batch size and the 300 epochs. The Adam optimizer (Kingma & Ba, Citation2015) whose momentum was 0.999 and learning rate was 1e-3 was adopted. In addition, the confidence threshold of 0.25 was used for all models. The validation loss was monitored at every epoch and the best weight of the model would be saved when the validation loss is smallest in the iterative process. We use the publicly available SeaShips(7000) dataset (Shao et al., Citation2018), which contains 7000 images with six common ship types. The dataset was randomly divided into 60%, 20%, and 20% for the training, validation, and test, respectively. The results shown in this paper are the average of the results after three random assignments and running on the test dataset.

The results of YOLOSeaShip

The models were evaluated on the YOLOSeaShip dataset by utilizing precision, recall, and mean Average Precision (mAP) metrics, which can be given by:

(6) Precision=TPTP+FP(6)
(7) Recall=TPTP+FN(7)
(8) AP=01P(R)dR(8)
(9) mAP=i=1NAPiN(9)

where TP and TN are the counts of positive and negative samples correctly classified. FP and FN are the counts of positive and negative samples which misclassification. N is the number of categories.

As displayed in , the metrics of the YOLOSeaShip in training are demonstrated. It can be seen that the three types of loss keep the state oscillating decline on the training dataset. For the validation dataset, the classification loss tends to be smooth after 200 epochs, while the box loss and the objectness loss exhibit a flat state after 250 epochs. As displayed in , precision and recall keep fluctuating after 200 epochs, while mAP has gradually leveled off. As a result, obtaining a more appropriate training weight can be accomplished by setting the epoch to 300. As listed in , the precision, recall, and mAP of the test dataset are 0.948, 0.959, and 0.976. The mAP of general cargo ships, container ships, and passenger ships reached above 0.98, and the mAP of the fishing vessels and bulk carriers also remained above 0.97. All kinds of detection mAP are higher than 0.95. Some examples are shown in , which show that the YOLOSeaShip can effectively locate and classify the ships.

Figure 5. The change of metrics during training. (a–c) Loss of box, object, and classification; (d) Evaluation metrics.

Figure 5. The change of metrics during training. (a–c) Loss of box, object, and classification; (d) Evaluation metrics.

Figure 6. Displaying the results. (a) Images; (b) Annotations; (c) Detections.

Figure 6. Displaying the results. (a) Images; (b) Annotations; (c) Detections.

Table 1. Performance results of YOLOv7 on validation and test set.

Ablation experiment

Different model structures

As displayed in , the metrics and parameters of different structural models are displayed. The results demonstrate that after incorporating the PConv structure, the parameters and detection speed of (Frame Per Second, FPS) the model were improved when compared to the YOLOv7-tiny model, although at the expense of some accuracy. After PfAAM was integrated into the model, the positioning ability of the model was improved due to the weight allocation of the attention mechanism. As seen in , it is obvious for coastal buildings to obstruct the positioning and identification of the large hull while it is in the coastal position. When the hull of a small target appears in the collected image, it is often ignored because of less weight allocation. There is no doubt that the integration of the PfAAM effectively improved the localization ability of the model.

Figure 7. The effects of different structures. (a) Annotations; (b) YOLOv7-tiny; (c) YOLOv7-tiny+PConv+PfAAM.

Figure 7. The effects of different structures. (a) Annotations; (b) YOLOv7-tiny; (c) YOLOv7-tiny+PConv+PfAAM.

Table 2. Comparison of detection results of different structures.

Loss function

Due to the balance between recall and precision is affected by γ, as shown in . Therefore, more attention is paid to mAP when comparing to evaluate the overall performance of the model. The result shows that when γ = 0.4, the model performs more effectively overall. Its mAP was the best, while other metrics also had a good performance. In addition, it is compared with several other common loss functions to verify the effectiveness of the adopted Focal EIOU loss function. As listed in , the Focal EIOU loss function utilized was better than the other loss functions on mAP metrics.

Figure 8. The effect of γ on the metrics.

Figure 8. The effect of γ on the metrics.

Table 3. Comparison of detection results with different loss functions.

Comparison of different models

The YOLOSeaShip was compared with some classical object detection models, which can be seen in . By calculating the mAP, Parameters, GFLOPs, and FPS of each method, an analysis of these models was conducted. As displayed in , the mAP of YOLOSeaShip on the set got 0.976, much higher than the other algorithms except YOLOv7. In terms of detection speed, it has an FPS score of 119.84 positions, which is better than all methods. In addition, the size of the parameters and GFLOPs are kept to a minimum. The results show that YOLOSeaShip has good comprehensive performance in ship detection. It provides the possibility of high efficiency and high precision ship detection and identification.

Table 4. Comparison of detection results with different methods.

Conclusion

In this study, we suggested a lightweight network architecture to solve the issues of ship detection. The main conclusions are given as:

  1. A lightweight network framework based on YOLOv7-tiny called YOLOSeaShip was proposed. The results trained on SeaShips(7000) demonstrated that our model had a high mAP of 0.976 and a fast FPS of 119.84.

  2. The addition of partial convolution reduces the model parameters and improves the operation speed. Without altering the model parameters, the Parameter-free Average Attention Module enhanced the capacity to locate both large and tiny objects.

  3. Compared with other loss functions, Focal EIOU loss had better results and obtained the best metrics on the dataset when γ was equal to 0.4. Considering some classical models, the proposed method can run faster under the premise of ensuring accuracy, which has an excellent prospect of promoting the intelligent construction of border and coastal defense.

Although our network improves inference speed, it does so at the expense of precision. In future work, the study of lightweight models with higher precision is the goal of our efforts.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

All data, models, and code generated or used during the study appear in the submitted article.

Additional information

Funding

This work was supported by the National Natural Science Foundation of China (No. 62102227), Zhejiang Basic Public Welfare Research Project (No. LZY22E050001, LZY22D010001, LGG19E050013, LZY21E060001,LTGC23E050001, LTGS23E030001, LZY24E050001), Science and Technology Major Projects of Quzhou (2021K29, 2022K56,2022K92,2023K221,2023K211).

References

  • Chen, J., Kao, S., He, H., Zhou, W., Wen, S., Lee, C., & Chan, S. (2023). Don’t walk: Chasing higher FLOPS for faster neural networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12021–10).
  • Chen, Z., Liu, C., Filaretov, V. F., & Yukhimets, D. A. (2023). Multi-scale ship detection algorithm based on YOLOv7 for complex scene SAR images. Remote Sensing, 15(8), 2071. https://doi.org/10.3390/rs15082071
  • Chen, Y., Zhang, C., Qiao, T., Xiong, J., & Liu, B. (2021). Ship detection in optical sensing images based on YOLOv5. Twelfth International Conference on Graphics and Image Processing (pp. 102–106). https://doi.org/10.1117/12.2589395
  • Escorcia-Gutierrez, J., Gamarra, M., Beleño, K., Soto, C., & Mansour, R. F. (2022). Intelligent deep learning-enabled autonomous small ship detection and classification model. Computers and Electrical Engineering, 100, 107871. https://doi.org/10.1016/j.compeleceng.2022.107871
  • Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1440–1448). https://arxiv.org/abs/1504.08083
  • Guo, J., Feng, H., Xu, H., Yu, W., & Ge, S. (2023). D3-net: Integrated multi-task convolutional neural network for water surface deblurring, dehazing and object detection. Engineering Applications of Artificial Intelligence, 117, 105558. https://doi.org/10.1016/j.engappai.2022.105558
  • He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2961–2969). https://arxiv.org/abs/1703.06870
  • Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. https://doi.org/10.48550/arXiv.1704.04861
  • Jiang, B., Luo, R., Mao, J., Xiao, T., & Jiang, Y. (2018). Acquisition of localization confidence for accurate object detection. Proceedings of the European conference on computer vision (pp. 784–799). https://doi.org/10.48550/arXiv.1807.11590
  • Jiang, Y., Peng, T., & Tan, N. (2019). CP-SSD: Context information scene perception object detection based on SSD. Applied Sciences, 9(14), 2785. https://doi.org/10.3390/app9142785
  • Kingma, D. P., & Ba, J. (2015). Adam: a method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (pp. 1–15). https://doi.org/10.48550/arXiv.1412.6980
  • Körber, N. (2022). Parameter-free average attention improves convolutional neural network performance (almost) free of charge. https://doi.org/10.48550/arXiv.2210.07828
  • Patel, K., Bhatt, C., & Mazzeo, P. L. (2022). Improved ship detection algorithm from satellite images using YOLOv7 and graph neural network. Algorithms, 15(12), 473. https://doi.org/10.3390/a15120473
  • Perera, L. P., & Soares, C. G. (2015). Collision risk detection and quantification in ship navigation with integrated bridge systems. Ocean Engineering, 109, 344–354. https://doi.org/10.1016/j.oceaneng.2015.08.016
  • Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. https://doi.org/10.48550/arXiv.1804.02767
  • Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster r-cnn: Towards real time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
  • Rezatofighi, H., Tsoi, N., Gwak, J. Y., Sadeghian, A., & Savarese, S. (2019). Generalized intersection over union: A metric and a loss for bounding box regression. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 658–666). https://doi.org/10.48550/arXiv.1902.09630
  • Shao, Z., Wu, W., Wang, Z., Du, W., & Li, C. Y. (2018). Seaships: A large-scale precisely annotated dataset for ship detection. IEEE Transactions on Multimedia, 20(10), 2593–2604. https://doi.org/10.1109/TMM.2018.2865686
  • Shi, G., Zhang, Y., & Zeng, M. (2021). A fast workpiece detection method based on multi-feature fused SSD. Engineering Computations, 38(10), 3836–3852. https://doi.org/10.1108/EC-10-2020-0589
  • Wang, C. Y., Bochkovskiy, A., & Liao, H. Y. M. (2023). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7464–7475). https://doi.org/10.48550/arXiv.2207.02696
  • Wang, X., Hua, X., Xiao, F., Li, Y., Hu, X., & Sun, P. (2018). Multi-object detection in traffic scenes based on improved SSD. Electronics, 7(11), 302. https://doi.org/10.3390/electronics7110302
  • Wen, G., Cao, P., Wang, H., Chen, H., Liu, X., Xu, J., & Zaiane, O. (2023). MS-SSD: Multi-scale single shot detector for ship detection in remote sensing images. Applied Intelligence, 53(2), 1586–1604. https://doi.org/10.1007/s10489-022-03549-6
  • Wu, J., Dong, J., Nie, W., & Ye, Z. (2023). A lightweight YOLOv5 optimization of coordinate attention. Applied Sciences, 13(3), 1746. https://doi.org/10.3390/app13031746
  • Wu, W., Li, X., Hu, Z., & Liu, X. (2023). Ship detection and recognition based on improved YOLOv7. Computers Materials & Continua, 76(1), 489–498. https://doi.org/10.32604/cmc.2023.039929
  • Xu, X., Zhang, X., & Zhang, T. (2022). Lite-YOLOv5: A lightweight deep learning detector for on-board ship detection in large-scene sentinel-1 sar images. Remote Sensing, 14(4), 1018. https://doi.org/10.3390/rs14041018
  • Yao, J., Qi, J., Zhang, J., Shao, H. M., Yang, J., & Li, X. (2021). A real-time detection algorithm for kiwifruit defects based on YOLOv5. Electronics, 10(14), 1711. https://doi.org/10.3390/electronics10141711
  • Zhang, Y. F., Ren, W., Zhang, Z., Jia, Z., Wang, L., & Tan, T. N. (2022). Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing, 506, 146–157. https://doi.org/10.1016/j.neucom.2022.07.042
  • Zhang, S., Wu, R., Xu, K., Wang, J., & Sun, W. (2019). R-CNN-based ship detection from high resolution remote sensing imagery. Remote Sensing, 11(6), 631. https://doi.org/10.3390/rs11060631
  • Zhang, T., & Zhang, X. (2019). High-speed ship detection in SAR images based on a grid convolutional neural network. Remote Sensing, 11(10), 1206. https://doi.org/10.3390/rs11101206
  • Zhang, T., Zhang, X., Shi, J., & Wei, S. (2019). Depthwise separable convolution neural network for high-speed SAR ship detection. Remote Sensing, 11(21), 2483. https://doi.org/10.3390/rs11212483
  • Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, D. (2020). Distance-IoU loss: Faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence (pp. 12993–13000). https://doi.org/10.1609/aaai.v34i07.6999
  • Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R. G., Hu, Q. H., & Zuo, W. M. (2021). Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Transactions on Cybernetics, 52(8), 8574–8586. https://doi.org/10.1109/TCYB.2021.3095305
  • Zhen, R., Ye, Y., Chen, X., & Xu, L. (2023). A novel intelligent detection algorithm of aids to navigation based on improved YOLOv4. Journal of Marine Science and Engineering, 11(2), 452. https://doi.org/10.3390/jmse11020452
  • Zhou, W., & Chen, P. (2021). A deep attention mechanism method for maritime salient ship detection in complex sea background. Optoelectronics Letters, 17(7), 438–443. https://doi.org/10.1007/s11801-021-0137-z
  • Zhu, B., Xiao, G., Zhang, Y., & Gao, H. (2023). Multi-classification recognition and quantitative characterization of surface defects in belt grinding based on YOLOv7. Measurement, 216, 112937. https://doi.org/10.1016/j.measurement.2023.112937