Full article: DWS-YOLO: A Lightweight Detector for Blood Cell Detection

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Peripheral blood cell detection is an essential component of medical practice and is used to diagnose and treat diseases, as well as to monitor the progress of therapies. Our objective is to construct an efficient deep learning model for peripheral blood cell analysis that achieves an optimized balance between inference speed, computational complexity, and detection accuracy. In this article, we propose the DWS-YOLO blood detector, which is a lightweight blood detector. Our model includes several improved modules, including the lightweight C3 module, the increased combined attention mechanism, the Scylla-IoU loss function, and the improved soft non-maximum suppression. Improved attention, loss function, and suppression enhance detection accuracy, while lightweight C3 module reduces computation time. The experiment results demonstrate that our proposed modules can enhance a detector’s detection performance, and obtain new state-of-the-art (SOTA) results and excellent robustness performance on the BCCD dataset. On the white blood cell detection dataset (Raabin-WBC), the proposed detector’s generalization performance was confirmed to be satisfactory. Our proposed blood detector achieves high detection accuracy while requiring few computational resources and is very suitable for resource-limited but efficient medical device environments, providing a reliable and advanced solution for blood detection that greatly improves the efficiency and effectiveness of peripheral blood cell analysis in clinical practice.

Introduction

Identifying and analyzing distinct blood cells is crucial for the diagnosis and treatment of diseases in the field of biomedicine. The three most essential components of blood are white blood cells (WBCs), red blood cells (RBCs), and platelets (Atkins et al. Citation2017). The diseased physiology of humans and animals can be revealed by the accurate and efficient detection of these three types of blood cells (Chan et al. Citation2013). When these three types of cells increase or decrease abnormally, certain symptoms may already be present in the human body. Thus, blood cell detection has become one of the most important procedures for regular blood detection during physical examination. Typically, the blood cell detector uses microscope recognition to detect blood cells (Choudhary et al. Citation2021, Citation2023). However, it is time-consuming, and the testing personnel’s level of fatigue and experience will impact the accuracy of the detection. Although alternative optical or electrical equipment exists, it is costly and requires specialized training and knowledge (Chao et al. Citation2022; Liu et al. Citation2020).

Earlier blood cell counting and detection methods are mainly based on traditional image processing techniques. Patil, Sable, and Anandgaonkar (Citation2014) used gray-level threshold processing to count and detect blood cells. The cell target region and other non-cell regions were described by the gray value of pixels, and identified by using a threshold. Biswas and Ghoshal (Citation2016) proposed a thresholding estimation approach with watershed transforming Sobel filter for the detection of different blood cells in microscopic images. Moreover, Acharya and Kumar (Citation2017) proposed an approach for counting red blood cells by using traditional image processing techniques to blood smear images. In conclusion, the above traditional detection and counting methods for blood cells provide limited performance.

In recent years, deep learning-based object detection methods have found wide use in various fields (Sun et al. Citation2023; Wang et al. Citation2023), especially in biomedicine, including abnormal detection and location in chest X-rays (Islam et al. Citation2017), the detection of diabetic retinopathy in retinal fundus photographs (Gulshan et al. Citation2016), and biomedical testing (He et al. Citation2022; Yadav and Yadav Citation2021; Zhao et al. Citation2022). A few works have used object detection algorithms based on deep learning in microscopic imaging applications, including blood cell detection. Tavakoli et al. (Citation2021) proposed a new technique for fusion segmentation feature extraction to detect white blood cell (WBC). In 2022, Kouzehkanan et al. (Citation2022) proposed Raabin-WBC large-scale dataset, which contains cell locations and types of WBC and a total of 14,514 images. Moreover, current deep learning-based blood cell detection methods can be roughly classified into two categories: two-stage and one-stage object detection. Two-stage object detection algorithms, which are mainly based on R-CNN (Girshick et al. Citation2014), Faster R-CNN (Ren et al. Citation2017), and Mask R-CNN (He et al. Citation2017), generate a series of candidate boxes before classifying them using a convolution neural network. Kutlu, Avci, and Özyurt (Citation2020) proposed a blood cell detection approach based on R-CNN and transfer learning, but this method is resource-intensive and time-consuming. One-stage object detection algorithms is mainly based on the early YOLO series (Farhadi and Redmon Citation2018; Redmon and Farhadi Citation2017; Redmon et al. Citation2015), SSD (Liu et al. Citation2016), and RetinaNet (Pei et al. Citation2020). Alam and Tariqul Islam (Citation2019) proposed a YOLO-based blood cell detector that uses different classification networks as its backbone, including VGG16, ResNet50, Inceptionv3, MobileNet, and YOLO-tiny. However, its performance is weak on the blood cell detection dataset (BCCD). Xia et al. (Citation2020) applied directly YOLOv3, YOLOv3-SPP, and YOLOv3-tiny on the same blood cell detection dataset and obtained better results. On BCCD, YOLOv3–SPP achieved a mAP value of 88.6%, while the other two detectors achieved mAP values of 80%. The models’ parameters are considerable, and their accuracy can be enhanced. Besides, Talukdar et al. (Citation2022) conducted a comparison of blood cell detection using the EfficientDet-D3, CenterNet, and Faster R-CNN detectors on a 1392-image private dataset.

Researchers are focusing more on the balance between accuracy and lightweight in blood cell detection. Shakarami et al. (Citation2021) proposed a fast and efficient model called FED, which is based on YOLOv3 and achieved 89.86% mAP on BCCD. Xu et al. (Citation2022) proposed the TE-YOLOF (tiny and efficient YOLOF) detector which utilized EfficientNet as the backbone and achieved 91.9% mAP on BCCD. In the process of lightweight, the aforementioned methods ignore rebuilding the model according to the characteristics of BCCD, such as overlap between blood cells, vague imaging backgrounds, and poor coloring, resulting in a performance loss. There is an urgent need for a more accurate and less complex object detection algorithm for blood cell detection.

YOLO series (Bochkovskiy, Wang, and Mark Liao Citation2020; Ge et al. Citation2021; Jocher et al. Citation2020; Li et al. Citation2022; Wang, Bochkovskiy, and Mark Liao Citation2022) is one of the most widely used single-stage object detection series. Different versions of YOLO have distinct features. YOLOv5 is distinguished from others versions by its small model size, rapid inference speed, and inexpensive deployment cost. YOLOv5-Nano (Jocher et al. Citation2020), the latest lightweight version of YOLOv5, offers a high detection accuracy and low computer resource consumption, requiring only 1.7 million parameters. This work discusses how to find the appropriate balance between detection accuracy, computational complexity, and inference speed. We proposed an improved blood cell detector based on YOLOv5-Nano, which overcomes the performance loss associated with the TE-YOLOF (Xu et al. Citation2022) detector. which more efficiently integrates feature map information into the environment while simultaneously enhancing the performance of blood cell detection.

In this paper, we describe DWS-YOLO, a new blood cell detection method. Our proposed detection algorithm is implemented and tested on the classic blood cell detection dataset (BCCD), the public object detection dataset and verified the generalization performance of the proposed model on the white blood cell detection dataset (Raabin-WBC). The following is a summary of the main contributions of this work:

An enhanced combined attention mechanism (ECAM) to efficiently capture sensitive information, and to increase the accuracy of blood cell detection.
Depthwise separable convolution (DSC) and inverted residual block are designed to reduce the parameters of the C3 module.
The SIoU loss function is used to improve the model’s convergence rate and regression accuracy.
The improved soft non-maximum suppression (Soft-NMS) technique is introduced to enhance detection accuracy.

Methods

Overview of DWS-YOLO

The intent of our work is to develop a blood cell detection method that achieves a balance among detection accuracy, computing complexity, and detection speed. In this section, we present an enhanced lightweight blood cell detection algorithm, DWS-YOLO, based on YOLOv5-Nano due to its speed, lightweight, and accuracy.

illustrates an overview of DWS-YOLO, which consists of three modules including the backbone, neck, and head. First, in the input part, the data is processed to increase the accuracy and discrimination of detection. Second, the backbone part is mainly divided into Conv, C3, and SPPF modules. Third, the neck network adopts a feature pyramid network to enhance the localization ability at different scales. Finally, the prediction head part is to use the loss function to calculate the position, classification, and confidence loss, respectively. Our improvement focuses mostly on the bottleneck module at the neck. We incorporate an inverted residual block with depthwise separable convolution into the module, add channel shuffle to the last layer of the C3 module, and integrate an improved ECAM into the enhanced C3 module, which is used to capture sensitive information. Furthermore, the improved Soft-NMS approach is utilized to increase detection accuracy, and SIoU loss is used to improve the convergence rate and regression accuracy of the model. summarizes the main distinctions between our proposed detectors, DWS-YOLO and YOLOv5-Nano.

Figure 1. Network structure of DWS-YOLO.

Table 1. Major differences between proposed detector DWS-YOLO and YOLOv5-nano.

Download CSV Display Table

Enhanced combined attention mechanism (ECAM)

Attention mechanisms can assist models in focusing on the most important information in the feature map, filtering out unnecessary features, and improving their ability to detect small targets. Some attention mechanisms, such as squeeze and excitation (SE) (Hu, Shen, and Sun Citation2018), efficient channel attention (ECA) (Wang et al. Citation2020), convolutional block attention module (CBAM) (Woo et al. Citation2018), and coordinate attention (CA) (Hou, Zhou, and Feng Citation2021), have been proven to be effective and helpful in many computer vision tasks (Hang et al. Citation2023; Huang et al. Citation2023; Shao et al. Citation2023; Zhao et al. Citation2023). Various attention mechanisms show different characteristics. For example, SE selectively amplifies the informative channels of features while ignoring the less important and relevant ones by learning a set of channel-wise weights. However, it involves high computational cost and may overfit to the training data. To efficiently localize cross-channel information, ECA employs a fast one-dimensional convolutional operation along the channel dimension of the feature map to capture the interdependencies between channels. But it pays insufficient attention to the spatial information of the feature map. CBAM uses both spatial and channel attention mechanisms to focus on important regions of feature maps, but it only considers local feature map information. Moreover, CA uses a more efficient way to encode both channel relationships and spatial information. It divides channel attention into two parallel one-dimensional feature encoding processes, incorporating spatial coordinate information into the feature maps.

The aforementioned attention mechanisms motivate us to explore a more effective attention mechanism that focuses on both channel and spatial information. Specifically, the advantages of ECA can be applied to CA, achieving cross-channel interaction of spatial information and strengthening the perception of local feature map information. Therefore, we propose an enhanced combined attention module (ECAM) that in order to efficiently encode spatial information, this module uses two directions in CA to select channel and spatial information so as to capture the long-range dependence relationship of the spatial direction and retain the accurate position relationship. As demonstrated in , ECAM consists of three processes: coordinate information embedding, coordinate attention generation, and efficient channel attention generation.

Figure 2. Schematic diagram of an ECAM module. Here, GAP refers to the global average pooling. X-GAP and Y-GAP refer to 1D horizontal global pooling and 1D vertical global pooling, respectively.

Coordinate information embedding

In channel attention, Global Average Pooling (GAP) is frequently used to reduce the dimensionality of each channel’s feature map to a single scalar value, which represents the importance or activation of that channel. A set of channel attention weights are then computed using this scalar value. The GAP is defined by:

(1)

z_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} x_{c} (i, j),

(1)

where $z_{c}$ is the output associated with the $c$ -th channel.

In order to capture the spatial information of the feature maps, CA divides the GAP into a pair of 1D feature encoding operations. We have followed the processing method of CA. Specifically, two spatial extents of pooling kernels $(H, 1)$ or $(1, W)$ is used to encode each channel along the horizontal coordinate and the vertical coordinate, respectively. The output of the $c$ -th channel at height $h$ is given by

(2)

z_{c}^{h} (h) = \frac{1}{W} \sum_{i = 1}^{W} x_{c} (h, i),

(2)

Similarly, the output of the $c$ -th channel at width $w$ can be written as

(3)

z_{c}^{w} (h) = \frac{1}{H} \sum_{i = 1}^{H} x_{c} (j, w) .

(3)

The two transformations described above encode the feature map separately along the horizontal and vertical directions to provide two direction-aware feature maps. Both conversions enable attention modules to record long-range dependencies along one spatial direction and store exact location information along the other spatial direction, so enabling the model to locate the region of interest more accurately.

Coordinate attention generation

During this procedure, two coordinate embeddings encoding the spatial information of each element in the feature map are generated. First, two feature maps generated by EquationEq (2(2) $z_{c}^{h} (h) = \frac{1}{W} \sum_{i = 1}^{W} x_{c} (h, i),$ (2) ) and EquationEq (3)(3) $z_{c}^{w} (h) = \frac{1}{H} \sum_{i = 1}^{H} x_{c} (j, w) .$ (3) are concatenated, and then they are sent to a $1 \times 1$ convolution $Con v_{2 d}^{1 \times 1}$ , which is given by

(4)

f = ReLU (Con v_{2 d}^{1 \times 1} ([z^{h}, z^{w}])),

(4)

with the square brackets $[,]$ representing the concatenation operation along the vertical coordinate; rectified linear unit (ReLU) function is used as a Non-linear operation. The output, $f \in R^{C / r \times (H + W)}$ , is the intermediate feature map that encodes spatial information in both the horizontal and the vertical directions, where $r$ is the downsampling ratio. Then, the intermediate feature map $f$ is split into two parts along the horizontal and the vertical directions, $f^{h} \in R^{C / r \times H}$ and $f^{w} \in R^{C / r \times w}$ . Another two $1 \times 1$ convolution $Con v_{2 d}^{1 \times 1}$ transformations are utilized to separately transform $f^{h}$ and $f^{w}$ to tensors with the same channel number of the input feature map, $X$ . Then, $f^{h}$ and $f^{w}$ are both expanded to the same size as the input feature map. Finally, the attention weight in two directions is then obtained by using sigmoid activation function:

(5)

g^{h} = Sigmod (Con v_{2 d}^{1 \times 1} (f^{h})),

(5)

(6)

g^{w} = Sigmod (Con v_{2 d}^{1 \times 1} (f^{w})) .

(6)

Efficient channel attention generation

A global average pooling operation is conducted on the input feature map, $X$ . After calculating a 1D convolution $Con v_{1 d}^{1 \times k}$ operation with a kernel size of $k$ adaptively determined, it obtains the weight of each channel through the sigmoid function and multiplies the weight with the corresponding element of the original input to output the feature map. The expression of this process is given by

(7)

m_{c} = Sigmod (Con v_{1 d}^{1 \times k} (AvgPool (X))) .

(7)

Then, given the channel dimension $C$ , the kernel size $k$ can be adaptively determined by

(8)

k = φ (C) = {|\frac{{log}_{2} (C)}{γ} + \frac{b}{γ}|}_{odd},

(8)

where $γ$ and $b$ are set to 2 and 1 by default, and ${|t|}_{odd}$ denotes the nearest odd number of $t$ .

The final output of the enhanced combined attention module, $Y$ , can be computed as follows:

(9)

y_{c} (i, j) = m_{c} (i, j) \times g_{c}^{h} (i) \times g_{c}^{w} (j) .

(9)

Compared to the original CA, our ECAM inherits the advantages of CA and ECA in establishing channel-to-channel connections while retaining channel attention in both the vertical and horizontal directions. It effectively merges local spatial information into generated feature maps by incorporating efficient channel attention, hence enabling cross-channel information interaction. Hence, the network’s perception of local spatial information on specific channels can be enhanced, and more important spatial information can be gathered to precisely identify the target positions of targets of interest.

Improved lightweight C3 module (C3DWS)

In this part, we introduce a lightweight C3 Module that is based on the C3 module. By increasing the network’s depth, the original C3 module of the YOLOv5 network improves the accuracy of the detector. Nevertheless, if a network is too deep, its gradient might become very small as it passes through each layer, making it impossible to update the weights of the network’s early layers. This phenomenon, known as gradient vanishing, makes model convergence problematic. Using residual structures to solve the gradient vanishing problem is an efficient strategy (He et al. Citation2016). Besides, depthwise separable convolution significantly improves the speed of several lightweight classification networks, such as Xception (Chollet Citation2017), MobileNets (Howard et al. Citation2017, Citation2019; Sandler et al. Citation2018), ShuffleNets (Ma et al. Citation2018; Zhang et al. Citation2018). We are inspired by the above works. To overcome the problem of gradient vanishing while lowering model complexity, we embed DSC into an inverted residual block and then use this lightweight inverted residual block to construct DWBottleneck, a lightweight bottleneck module.

DSC reduces the computational cost of convolutional layer computations. However, it divides information between channels. Many strategies have been proposed to mitigate this DSC flaw. For example, MobileNets uses a large number of $1 \times 1$ convolutions to fuse independently computed channel information. ShuffleNets uses channel shuffling to enable the interaction of channel information. In order to address this issue, we incorporate channel shuffling to our C3 module, which is performed on the feature information produced after the two branches of the improved C3 module are concatenated.

It is optimal to employ a feature enhancement module after the dimension transfer of the feature map in the C3 module and before entering the next convolution or bottleneck block (Li et al. Citation2022). Typically, when convolution is employed as a downsampling method, the width and height dimensions are decreased while the channel size is increased. Hence, we suggest employing several local feature enhancement methods, such as attention modules, with simple structures and low computing costs. We incorporate the module (ECAM) presented in the preceding section into the improved C3 module to further process the feature map after the channel shuffle, with the expectation that the model will pay more attention to high-quality feature map information. Hence, low-level feature information in the network’s superficial layers and high-level feature information in its deeper layers are fully integrated. This improved module is called C3DWS, and its structure is depicted in .

Figure 3. Schematic diagram of an improved lightweight C3 module(C3DWS).

Note that we do not use C3DWS at every stage of the model, as if we did, the model’s network layers would be much deeper, which would increase the inference time. By the time, these feature maps reach the neck, after numerous rounds of convolution and pooling, their size will have gradually decreased to a minimum, but the number of channels will have increased to its maximum. Thus, we only use C3DWS in the neck, giving in a thin neck and a standard backbone structure.

Scylla intersection over union loss (SIoU)

Our proposed model utilizes SIoU loss as the loss function of bounding box. IoU (Yu et al. Citation2016) losses are widely applied in deep learning-based detectors. It increases the accuracy of the position of regression prediction bounding boxes, which is advantageous for achieving a faster convergence speed in training. There are a growing number of IoU variations, such as CIoU (Zheng et al. Citation2021), EIoU (Zhang et al. Citation2022) most recently SIoU (Gevorgyan Citation2022), which also are employed in some biomedical detection tasks (Hwang et al. Citation2022; Liu Citation2022; Pacal and Karaboga Citation2021; Yang et al. Citation2022). The CIoU loss is now the most used loss function in anchor-based detectors, which takes into account the center point distance, overlap area, and aspect ratio of the predicted and ground truth boxes. However, the direction mismatch between the ground truth bounding box and the actual predicted bounding box is not considered. Thus, the predicted bounding boxes may shift off during the training process, resulting in a slower convergence and reduced accuracy.

To avoid this problem, we replace the original loss function with SIoU. which consists of four cost functions, including angle cost, distance cost, shape cost, and IoU cost. SIoU is defined by

(10)

L_{box} = 1 - IoU + \frac{Δ + Ω}{2} .

(10)

where IoU is the intersection-over-union cost. $Δ$ and $Ω$ are the distance cost and the shape cost, respectively. The IoU cost is defined as

(11)

IoU = \frac{|A \cap B|}{|A \cup B|},

(11)

where $A$ represents the prediction frame and $B$ represents the real frame. The distance cost takes into account the cost of the angles of the two boxes, which is given by

(12)

Δ = 2 - e^{(Λ - 2) \times {(\frac{b_{c_{x}}^{gt} - b_{c_{x}}}{C_{w}})}^{2}} - e^{(Λ - 2) \times {(\frac{b_{c_{y}}^{gt} - b_{c_{y}}}{C_{h}})}^{2}} .

(12)

where $Λ$ is the angle cost that allows for the prediction frame to move faster to the horizontal or vertical line where the real frame is located. The angle cost is given by

(13)

Λ = 1 - 2 * {sin}^{2} (arcsin (x) - \frac{π}{4}),

(13)

(14)

x = \frac{m a x (b_{c_{y}}^{gt}, b_{c_{y}}) - \min (b_{c_{y}}^{gt}, b_{c_{y}})}{\sqrt{{(b_{c_{x}}^{gt} - b_{c_{x}})}^{2} + {(b_{c_{y}}^{gt} - b_{c_{y}})}^{2}}},

(14)

where $b_{c_{x}}^{gt}$ and $b_{c_{y}}^{gt}$ are the barycenter coordinates of the truth box, $b_{c_{x}}$ and $b_{c_{y}}$ are the center coordinates of the prediction box, and $c_{w}$ and $c_{h}$ are the width and height of the minimum external matrix of the ground truth box and prediction box. The shape cost is defined as

(15)

Ω = {(1 - e^{- \frac{|w - w^{gt}|}{\max (w, w^{gt})}})}^{θ} + {(1 - e^{- \frac{|h - h^{gt}|}{\max (h, h^{gt})}})}^{θ},

(15)

where $w^{gt}$ and $h^{gt}$ represent the width and height of the ground truth box, while $w$ and $h$ represent the width and height, respectively, of the predicted box.

SIoU takes into account the angle between the vectors of expectation and regression, and redefines the penalty measure. By incorporating angle cost into the loss function, the problem of mismatch between the ground truth and the predicted bounding box is effectively resolved, the training and inference of the model are greatly improved, and training stage convergence is accelerated.

Improved soft non-maximum suppression (Im-Soft-NMS)

Before the post-processing of object detection, there exist numerous bounding boxes with high confidence surrounding the target objects. Non-maximum suppression (NMS) is widely used to eliminate redundant anchor boxes, leaving only one high-confidence bounding box. NMS begins with the box with the highest confidence score and eliminates other boxes that significantly overlap it. This process is repeated until there are no remaining overlapping bounding boxes to be deleted. For an image with many objects, if the overlap degree of different objects is high and the overlap area ratio of the corresponding anchor box exceeds the threshold, the anchor box with a low confidence score will be eliminated, and the related objects may not be detected, diminishing the overall effectiveness of detection. Due to the fact that the original NMS sets the confidence score of adjacent anchor boxes whose IoU exceeds the threshold to zero, the aforementioned problem occurs, and these anchor boxes are then eliminated.

Original NMS is not suitable for BCCD because there are many overlapping cells in the BCCD images. Therefore, we adopt soft non-maximum suppression proposed by Bodla et al. (Citation2017) and improve it to tackle the aforementioned problem without increasing its complexity. Soft NMS introduces the Gaussian weighted penalty function to reduce the confidence score of adjacent anchor boxes so as to preserve these anchor boxes. The score calculation for Soft NMS is given by

(16)

S_{i}^{'} = \{\begin{matrix} S_{i}, & IoU (M, B_{i}) < N_{t} \\ S_{i} e^{- \frac{IoU {(M, B_{i})}^{2}}{σ}}, & IoU (M, B_{i}) \geq N_{t} \end{matrix}

(16)

where $M$ is the anchor box with the highest confidence, $B_{i}$ is the anchor box to be detected, $IoU (M, B_{i})$ is the $IoU$ between $M$ and $B_{i}$ , and $N_{t}$ is the threshold of the $IoU$ ; $S_{i}$ is the original confidence score of $B_{i}$ , $S_{i}^{'}$ is the soft confidence score, and $σ$ is the exponential penalty parameter. $N_{t}$ and $σ$ are typically set at 0.30 and 0.10, respectively. As seen by the EquationEq (16)(16) $S_{i}^{'} = \{\begin{matrix} S_{i}, & IoU (M, B_{i}) < N_{t} \\ S_{i} e^{- \frac{IoU {(M, B_{i})}^{2}}{σ}}, & IoU (M, B_{i}) \geq N_{t} \end{matrix}$ (16) , when $IoU (M, B_{i})$ is smaller than the threshold value, confidence will decrease exponentially.

Traditional IoU NMS compares the IoU values of every pair of bounding boxes in an image. Zheng et al. (Citation2020) proposed DIoU NMS to extend this idea by taking into account the spatial distance between bounding boxes. In addition to the IoU value, the distance between two bounding boxes is also considered when deciding which box to suppress. DIoU-NMS is formally defined as

(17)

S_{i}^{'} = \{\begin{matrix} S_{i}, & IoU - \frac{ρ^{2} (b, b^{gt})}{c^{2}} (M, B_{i}) < ε \\ 0, & IoU - \frac{ρ^{2} (b, b^{gt})}{c^{2}} (M, B_{i}) \geq ε \end{matrix}

(17)

where $ε$ is the NMS threshold, $b$ and $b^{gt}$ represent the center point of predicted box and ground truth box respectively, $ρ (\cdot)$ is the Euclidean distance, and $c$ is the diagonal length of the minimum enclosing box covering the two boxes.

Inspired by the above work, we propose an improved non-maximum suppression algorithm. On the basis of Soft NMS, SIoU is introduced to take the object scale and distance into consideration in IoU. At the same time, the angle between the expected regressions is considered. Therefore, the Gaussian weighted penalty function of the improved Soft-NMS is given by

(18)

S_{i}^{'} = \{\begin{matrix} S_{i}, & IoU - \frac{Δ + Ω}{2} (M, B_{i}) < N_{t} \\ S_{i} e^{- \frac{IoU {(M, B_{i})}^{2}}{σ}}, & IoU - \frac{Δ + Ω}{2} (M, B_{i}) \geq N_{t} \end{matrix}

(18)

where $Δ$ and $Ω$ represent the distance cost and shape cost, respectively, as mentioned in EquationEq (18)(18) $S_{i}^{'} = \{\begin{matrix} S_{i}, & IoU - \frac{Δ + Ω}{2} (M, B_{i}) < N_{t} \\ S_{i} e^{- \frac{IoU {(M, B_{i})}^{2}}{σ}}, & IoU - \frac{Δ + Ω}{2} (M, B_{i}) \geq N_{t} \end{matrix}$ (18) . The improved Soft NMS begins with the box with the highest confidence score and eliminates other boxes that have low SIoU values. It can ensure good performance even when a large number of cells overlap because of the Gaussian weighted penalty function an SIoU.

Results

We evaluate our method across three publicly available datasets including BCCD (Roboflow Citation2021), Raabin-WBC (Kouzehkanan et al. Citation2022), and PASCAL VOC (Everingham et al. Citation2015). PASCAL VOC is a benchmark for object detection that is available in two versions: PASCAL VOC 2007 and PASCAL VOC 2012, the former contains 9963 images and the latter contains 11,540 images. Thus, in order to verify the effectiveness of our proposed modules, such as C3DWS, we conducted ablation experiments on the proposed model using PASCAL VOC dataset. We randomly split PASCAL VOC 2007 + 12 dataset into training, validation, and test sets. 70% of the actual data are distributed to the training set, 20% to the validation set, and 10% to the test set. In order to validate the overall performance of the proposed cell detector, we compare our proposed method to existing blood cell detection approaches using BCCD dataset. BCCD is a publicly available dataset of microscopic blood smear images used for blood cell detection and counting tasks, which contains 364 images. BCCD dataset contains images of three types of blood cells: red blood cells, white blood cells, and platelets, with each image measuring 640 by 480 pixels. We resize every image to $416 \times 416$ . BCCD dataset are also divided into training, validation, and test sets in a ratio of 7: 2: 1. Raabin-WBC dataset consists of 14,514 images with cell locations and types of WBC. In our work, it is mainly used for cross-domain experiments to verify the generalization of our proposed model.

Training setting and hyper-parameter

We use stochastic gradient descent (SGD) optimizer with a batch size of 16. The initial learning rate is set to 0.01, the momentum factor is set to 0.937, and the weight decay is set to 0.0005. The size of the model input image is $640 \times 640$ , a total of 300 epochs of training. The above training parameter settings were uniformly used for all subsequent ablation experiments and comparison experiments. All experiments are conducted using Pytorch on a single 24 G NVIDIA GeForce RTX 3090 GPU. In addition, all models were trained from scratch.

Evaluation metrics

In order to better evaluate and compare the detectors, precision rate (P), recall rate (R), average precision (AP), and mean average precision (mAP) are used as evaluation metrics. They can be formulated as follows:

(19)

P = \frac{TP}{TP + FP},

(19)

(20)

R = \frac{TP}{TP + FN},

(20)

(21)

AP = \int_{0}^{1} P (R) dR,

(21)

(22)

mAP = \frac{1}{N} \sum_{i = 1}^{N} A P_{i},

(22)

where TP, FP, and FN represent the number of true positive samples, false positive samples, and false negative samples, respectively. Precision rate is the proportion of correctly predicted positive instances relative to the total number of positive instances predicted, whereas recall rate is the proportion of true positives out of all relevant items in a dataset that have been correctly identified by a model. AP stands for the area under the Precision-Recall (p-R) curve and is used to assess the overall performance of each category on the test model. mAP calculates the average of all APs in each category to determine the overall detection capability of a detector. $mA P_{0.5}$ is the average precision of all categories when the accuracy evaluation IoU threshold is set to 0.5 and $mA P_{0.95}$ is the average precision with the IoU threshold taken in steps of 0.05 from 0.5 to 0.95 and weighted average. Using mAP as the main evaluation metric can demonstrate which detector performs best overall. In addition, frames per second (FPS), parameters (Params), and the floating-point operations (FLOPs) are used to compare the inference speed, space complexity, and computational complexity of the detectors.

Ablation study

We make ablation experiments on PASCAL VOC 2007 + 12 to evaluate the contribution of all the components described in Section 2, including ECAM, SIoU, C3DWS, and improved Soft-NMS.

Attention mechanism

As shown in , we evaluate the performance of a YOLOv5-Nano detector with five different attention modules, namely CBAM, ECA, CA, SE, and ECAM. We use the same hyperparameters and embed each of the five attention modules into the C3 module of the neck to allow for fair comparisons. Compared with the baseline, the five attention mechanism methods improve the detector in the aforementioned indicators to varying degrees. Specifically ECAM, with $mA P_{0.5}$ and $mA P_{0.95}$ reaching 68.1% and 41.7%, respectively, representing increases of 1.4% and 2.6% over the baseline.

Table 2. The performance comparisons of employment of different attention modules (PASCAL VOC 2007 + 12 dataset).

Display Table

Loss function

We evaluate the improvement from the adopted SIoU loss function and report the results in . SIoU obtains 73.3% precision rate, 61.5% recall rate, 66.9% $mA P_{0.5}$ , and 40.4% $mA P_{0.95}$ . Compared to CIoU and EIoU, all have different levels of improvement.

Table 3. The comparison of the effects of CIoU/EIoU/SIoU loss functions (GPU: RTX 3090; dataset: PASCAL VOC 2007 + 12).

Display Table

Component analysis

We conduct single-factor and multiple-factor experiments to evaluate how the aforementioned components influence the performance of the proposed detector.

In the single-factor experiment, we use YOLOv5-Nano as a baseline to compare the efficacy of each module. shows the results of the ablation experiment. Individually, the four modules improve different indicators to varying degrees. Compared with the baseline, C3DWS module reduces parameters by 12.6% and FLOPs by 9.3%; ECAM module increases the precision rate by 1.9%; and improved Soft-NMS increases $mA P_{0.5}$ and $mA P_{0.95}$ by 1.8% and 6.4%, respectively.
Table 4. The single-factor ablation studies with four implements (GPU: RTX 3090; dataset: PASCAL VOC 2007 + 12).
Display Table
In the multi-factor experiment, we also use YOLOv5-Nano as a baseline and compare it to four models containing various combinations of the four components. Each combination adds an additional module based on the previous ones. shows the results of the ablation experiment. Individually, the four module combinations improve different indicators to varying degrees. Especially, the combination of C3DWS, ECAM, SIoU, and improved Soft-NMS have reached 70.7% $mA P_{0.5}$ and 47.0% $mA P_{0.95}$ , which are 4% and 7.9% higher than the baseline.
Table 5. The multi-factor ablation studies with four implements (GPU: RTX 3090; dataset: PASCAL VOC 2007 + 12).
Display Table

Comparison with previous blood cell detection works

We evaluate the performance of the proposed DWS-YOLO detector on the BCCD dataset and compared it with the previous blood cell detection works (Alam and Tariqul Islam Citation2019; Shakarami et al. Citation2021; Xia et al. Citation2020; Xu et al. Citation2022), etc. shows the results of comparative experiments. Blood cell detection methods used in previous studies are becoming lighter as blood cell detection research advances, but there is no optimal proportion between the number of model parameters, computational complexity, and detection accuracy. Our method outperforms the previous best method TE-YOLOF by 1.2% $mA P_{0.95}$ even though having only 1.56 M parameters and 3.8 G FLOPS. In short, our proposed is lightweight, more compact, and has a reduced computational complexity. We show some visualization results in . It is evident that our model can detect tiny cells, specifically platelets, as indicated by the red boxes.

Figure 4. Visualization detection results on BCCD test set. Top to bottom: (a)–(e) Original image; (f)–(j) Detection results of the DWS-YOLO.

Table 6. The result comparisons with previous blood cell detection works (GPU: RTX 3090; dataset: BCCD; platelets, RBC, and WBC represent the mAP of each category).

Display Table

Comparison with state-of-the-art object detectors

We compared the proposed detector with the four most advanced object detection models, including YOLOX-Nano (Ge et al. Citation2021), YOLOX-S (Ge et al. Citation2021), YOLOv6-Nano (Li et al. Citation2022), and YOLOv7-Tiny (Wang, Bochkovskiy, and Mark Liao Citation2022), on the BCCD dataset. For fair comparisons, we use the same data preprocessing and hyperparameter settings. As shown in , our proposed detector outperforms all state-of-the-art object detectors in the majority of metrics. This shows that in the field of blood cell detection, our improved model has more advantages than the existing SOTA model.

Figure 5. Model performance comparison with SOTA object detection models. WBC, RBC, and platelets represents the mAP of each category.

Robustness analysis

To evaluate the robustness performance of the proposed detector. We split the BCCD dataset into four different division ratios in the order of training set, validation set, and test set, respectively. We compared the proposed detector with the four most advanced object detection models mentioned in section 3.5. As shown in , Our proposed detector outperforms all state-of-the-art object detectors in any of the division ratios. This shows that our improved model is more robust than the existing SOTA model in the field of blood cell detection.

Figure 6. Comparison of model performance with the SOTA object detection model. The $mA P_{0.5}$ with different division ratios of the BCCD dataset.

Analysis of ROC curve

In order to validate and visualize the detection performance of the model, we plotted the ROC curve of the DWS-YOLO detector on the BCCD dataset, and shown in , the AUC scores on the WBC, RBC, and Platelets categories achieved 0.982, 0.957, and 0.903, respectively, which indicates that our improved model exhibits excellent performance on peripheral blood cell detection.

Figure 7. The ROC curve of the DWS-YOLO detector on the BCCD dataset. WBC, RBC, and platelets represents the AUC score of each category.

Comparison with the baseline in visualization heatmaps

For qualitative analysis, we applied the Grad-CAM (Selvaraju et al. Citation2017) to the improved model using images from the BCCD test set. Grad-CAM is a visualization method to calculate the importance of the spatial locations in convolution layers using gradients. We compare the visualization results of the baseline model and the model embedding the ECAM mechanism, and shown in , we can clearly see that the Grad-CAM mask of the ECAM ensemble model covers the object region better than other methods. In other words, the ECAM integrated model can make good use of the information in the distribution area of blood cells, and aggregate the spatial and location information features of blood cells from it.

Figure 8. Grad-CAM visualization results on BCCD test set. Top to bottom: (a)–(c) Original image; (d)–(f) Detection results of the baseline model YOLOv5-Nano; (g)-(i) Detection results of the improved model DWS-YOLO.

Cross-domain experiment

To evaluate the generalization performance of the proposed detector, the DWS-YOLO detector trained on the BCCD dataset is applied directly to the Raabin-WBC dataset. The visualization results are depicted in ; it can be seen that our detector can detect the majority of red blood cells and all white blood cells in the WBC images, demonstrating the outstanding generalization performance of our improved detector.

Figure 9. Visualization detection results on raabin-WBC dataset. Top to bottom: (a)–(e) Original image; (f)–(j) Detection results of the DWS-YOLO.

Discussion

One of the challenges of blood cell detection is that the scale of blood cells varies greatly, red blood cell adhesion overlaps, and the sparseness of platelets and white blood cells leads to low detection accuracy. Compared with the traditional manual microscope inspection method, the detection method combining blood cell detection and deep learning technology can reduce human error and missed detection while improving the accuracy, efficiency, and objectivity of detection. Compared with some methods based on image segmentation or classification, it can output the category and location information of blood cells at the same time, which is more comprehensive and intuitive. At the same time, past methods of blood cell detection have had the problem of not finding an appropriate balance between detection accuracy, computational complexity, and inference speed.

Based on the shortcomings of the past detection methods mentioned above, we innovatively proposed a lightweight blood cell detector. For the problems of low blood cell detection accuracy and large model sizes in the field of blood cell detection, we use the enhanced joint attention mechanism and the introduction of band. The inverted residual structure method with depth-separable convolution can effectively reduce the amount of model parameters and reduce the computational complexity of the model while improving the accuracy of blood cell detection and optimizing the loss function to speed up the convergence of the blood cell detection model during training speed. Finally, an improved non-maximum suppression method is proposed to solve the problem of blood cell adhesion overlap. These new and improved techniques have significantly reduced model size and computational complexity compared to past methods while maintaining high detection accuracy, which is important for achieving fast and accurate blood cell detection.

In addition, real-time, accurate blood cell detection technology can be integrated into portable or mobile medical devices to enable Point-of-Care (POC) and telemedicine services, allowing patients to receive timely, high-quality blood detection while away from major medical facilities. However, different types of blood cells have different morphologies, especially for rare or mutated cells, and the DWS-YOLO detector may require a large number of labeled samples to obtain the desired detection accuracy. As the sample size of the blood cell detection dataset used in this paper is too small, it may have a negative impact on the accuracy of blood cell detection.

Conclusion

In this paper, we propose DWS-YOLO based on YOLO for blood cell detection, which achieves a suitable balance between detection accuracy, computational complexity, and inference speed. We improve the network structure of the YOLOv5-Nano using a lightweight C3 module, and enhanced attention mechanism, an efficient loss function, and improved soft non-maximum suppression. Due to the aforementioned enhancements, our detector has great potential in the field of cell detection applications. Several evaluation metrics, such as the number of parameters, computational complexity, and detection accuracy, have been considerably enhanced in our proposed model, as demonstrated by experimental results across different datasets. In comparison to the most advanced object detection models, our model achieves SOTA results and stronger robustness performance on the BCCD dataset. In addition, it has outstanding generalization performance on the Raabin-WBC dataset. All of the preceding evidence demonstrates that our proposed detector is a blood cell detector that is smaller, quicker, and more accurate.

In the future, due to the insufficient sample size of our dataset, attempts will be made to combine adversarial semi-supervised learning with small-sample learning so as to obtain strong detection performance in a limited sample size. In addition, due to the small number of parameters and lower computational complexity of the proposed detector, it will be considered for deployment on edge devices with limited computational capacity. This is essential for clinical diagnosis and monitoring applications.

Supplemental material

uaai_a_2318673_sm9021.bib

Download Bibliographical Database File (25.8 KB)

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Additional information

Funding

This work was supported by Natural Science Foundation of Fujian Province of China [No. 2023J011456 and No.2023J05084], High-level Talents Program of Xiamen University of Technology [No. YKJ22029R and YKJ22028R], and Fujian Province Young and Middle-aged Teachers’ Educational Research Project [No. JAT220334].

References

Acharya, V., and P. Kumar. 2017. Identification and red blood cell automated counting from blood smear images using computer-aided system. Medical & Biological Engineering & Computing 56 (3):483–26. doi:10.1007/s11517-017-1708-9.
PubMed Web of Science ®Google Scholar
Alam, M. M., and M. Tariqul Islam. 2019. Machine learning approach of automatic identification and counting of blood cells. Healthcare Technology Letters 6 (4):103–08. doi:10.1049/htl.2018.5098.
PubMed Web of Science ®Google Scholar
Atkins, C. G., K. Buckley, M. W. Blades, and R. F. Turner. 2017. Raman spectroscopy of blood and blood components. Applied Spectroscopy 71 (5):767–93. doi:10.1177/0003702816686593.
PubMed Web of Science ®Google Scholar
Biswas, S., and D. Ghoshal. 2016. Blood cell detection using thresholding estimation based watershed transformation with Sobel filter in frequency domain. Procedia Computer Science 89:651–57. doi:10.1016/j.procs.2016.06.029.
Google Scholar
Bochkovskiy, A., C.-Y. Wang, and H.-Y. Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv Preprint arXiv 2004:10934.
Google Scholar
Bodla, N., B. Singh, R. Chellappa, and L. S. Davis. 2017. Soft-NMS–improving object detection with one line of code. Proceedings of the Ieee International Conference on Computer Vision (ICCV), Venice, Italy, 5562–70. doi:10.1109/ICCV.2017.593.
Google Scholar
Chan, L. L.-Y., D. J. Laverty, T. Smith, P. Nejad, H. Hei, R. Gandhi, D. Kuksin, and J. Qiu. 2013. Accurate measurement of peripheral blood mononuclear cell concentration using image cytometry to eliminate RBC-induced counting error. Journal of Immunological Methods 388 (1–2):25–32. doi:10.1016/j.jim.2012.11.010.
PubMed Web of Science ®Google Scholar
Chao, Y.-L., P.-Y. Wu, J.-C. Huang, Y.-W. Chiu, J.-J. Lee, S.-C. Chen, J.-M. Chang, S.-J. Hwang, and H.-C. Chen. 2022. Hepatic steatosis is associated with high white blood cell and platelet counts. Biomedicines 10 (4):892. doi:10.3390/biomedicines10040892.
PubMed Web of Science ®Google Scholar
Chollet, F. 2017. Xception: Deep learning with depthwise separable convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 1800–07. doi:10.1109/CVPR.2017.195.
Google Scholar
Choudhary, O. P., R. Sarkar, G. E. Chethan, P. Jyoti Doley, P. Chandra Kalita, A. Kalita. 2021. Preparation of blood samples for electron microscopy: The standard protocol. Annals of Medicine & Surgery 70:102895. doi:10.1016/j.amsu.2021.102895.
PubMed Web of Science ®Google Scholar
Choudhary, O. P., R. Sarkar, F. A. Madkour, P. Chandra Kalita, P. Jyoti Doley, A. Kalita, P. Choudhary, and C. Gollahalli Eregowda. 2023. Peripheral blood cells of native pig (zovawk) of Mizoram, India: Light and scanning electron microscopy analysis. Microscopy Research and Technique 86 (3):331–41. doi:10.1002/jemt.24274.
PubMed Web of Science ®Google Scholar
Everingham, M., S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. 2015. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision 111 (1):98–136. doi:10.1007/s11263-014-0733-5.
Web of Science ®Google Scholar
Farhadi, A., and J. Redmon. 2018. Yolov3: An incremental improvement. arXiv Preprint arXiv 1804:02767.
Google Scholar
Ge, Z., S. Liu, F. Wang, Z. Li, and J. Sun. 2021. Yolox: Exceeding yolo series in 2021. arXiv Preprint arXiv 2107:08430.
Google Scholar
Gevorgyan, Z. 2022. SIoU loss: More powerful learning for bounding box regression. arXiv Preprint arXiv 2205:12740.
Google Scholar
Girshick, R., J. Donahue, T. Darrell, and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv Preprint arXiv 1311:2524.
Google Scholar
Gulshan, V., L. Peng, M. Coram, M. C. Stumpe, D. Wu, A. Narayanaswamy, S. Venugopalan, K. Widner, T. Madams, J. Cuadros, et al. 2016. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama 316 (22):2402–10. doi:10.1001/jama.2016.17216.
PubMed Web of Science ®Google Scholar
Hang, J., Y. Wu, Y. Li, T. Lai, J. Zhang, and Y. Li. 2023. A deep learning semantic segmentation network with attention mechanism for concrete crack detection. Structural Health Monitoring 22 (5):3006–26. doi:10.1177/14759217221126170.
Web of Science ®Google Scholar
He, K., G. Gkioxari, P. Dollár, and R. Girshick. 2017. Mask r-cnn. arXiv Preprint arXiv 1703:06870.
Google Scholar
He, W., Y. Han, W. Ming, J. Du, Y. Liu, Y. Yang, L. Wang, Y. Wang, Z. Jiang, C. Cao, et al. 2022. Progress of machine vision in the detection of cancer cells in histopathology. IEEE Access 10:46753–71. doi:10.1109/ACCESS.2022.3161575.
Web of Science ®Google Scholar
He, K., X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 770–78. doi:10.1109/CVPR.2016.90.
Google Scholar
Hou, Q., D. Zhou, and J. Feng. 2021. Coordinate attention for efficient mobile network design. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 13713–22. doi:10.1109/CVPR46437.2021.01350.
Google Scholar
Howard, A., M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang. 2019. Searching for mobilenetv3. Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), Seoul, Korea (South), 1314–24. doi:10.1109/ICCV.2019.00140.
Google Scholar
Howard, A. G., M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv Preprint arXiv 1704:04861.
Google Scholar
Huang, W., Y. Huo, S. Yang, M. Liu, H. Li, and M. Zhang. 2023. Detection of laodelphax striatellus (small brown planthopper) based on improved YOLOv5. Computers and Electronics in Agriculture 206:107657. doi:10.1016/j.compag.2023.107657.
Web of Science ®Google Scholar
Hu, J., L. Shen, and G. Sun. 2018. Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 7132–41. doi:10.1109/CVPR.2018.00745.
Google Scholar
Hwang, B., J. Kim, S. Lee, E. Kim, J. Kim, Y. Jung, and H. Hwang. 2022. Automatic detection and segmentation of thrombi in abdominal aortic aneurysms using a mask region-based convolutional neural network with optimized loss functions. Sensors 22 (10):3643. doi:10.3390/s22103643.
PubMed Web of Science ®Google Scholar
Islam, M. T., M. Abdul Aowal, A. Tahseen Minhaz, and K. Ashraf. 2017. Abnormality detection and localization in chest x-rays using deep convolutional neural networks. arXiv Preprint arXiv 1705:09850.
Google Scholar
Jocher, G. 2020. YOLOv5 by Ultralytics v 6.1. https://github.com/ultralytics/yolov5.
Google Scholar
Kouzehkanan, Z. M., S. Saghari, S. Tavakoli, P. Rostami, M. Abaszadeh, F. Mirzadeh, E. Shahabi Satlsar, M. Gheidishahran, F. Gorgi, S. Mohammadi, et al. 2022. A large dataset of white blood cells containing cell locations and types, along with segmented nuclei and cytoplasm. Scientific Reports 12(1):1123. doi:10.1038/s41598-021-04426-x.
PubMed Web of Science ®Google Scholar
Kutlu, H., E. Avci, and F. Özyurt. 2020. White blood cells detection and classification based on regional convolutional neural networks. Medical Hypotheses 135:109472. doi:10.1016/j.mehy.2019.109472.
PubMed Web of Science ®Google Scholar
Li, C., L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke. 2022. YOLOv6: A single-stage object detection framework for industrial applications. arXiv Preprint arXiv 2209:02976.
Google Scholar
Li, H., J. Li, H. Wei, Z. Liu, Z. Zhan, and Q. Ren. 2022. Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles. arXiv Preprint arXiv 2206:02424.
Google Scholar
Liu, K. 2022. Stbi-YOLO: A real-time object detection method for lung nodule recognition. IEEE Access 10:75385–94. doi:10.1109/ACCESS.2022.3192034.
Google Scholar
Liu, W., D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. 2016. Ssd: Single shot multibox detector. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 21–37. Springer.
Google Scholar
Liu, R., A. K. M. Arifuzzman, N. Wang, O. Civelekoglu, and A. Fatih Sarioglu. 2020. Electronic immunoaffinity assay for differential leukocyte counts. Journal of Microelectromechanical Systems 29 (5):942–47. doi:10.1109/JMEMS.2020.3012305.
Web of Science ®Google Scholar
Ma, N., X. Zhang, H.-T. Zheng, and J. Sun. 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. arXiv Preprint arXiv 1807:11164.
Google Scholar
Pacal, I., and D. Karaboga. 2021. A robust real-time deep learning based automatic polyp detection system. Computers in Biology and Medicine 134:104519. doi:10.1016/j.compbiomed.2021.104519.
PubMed Web of Science ®Google Scholar
Patil, P. R., G. S. Sable, and G. Anandgaonkar. 2014. Counting of WBCs and RBCs from blood images using gray thresholding. International Journal of Research in Engineering and Technology 3 (4):391–95. doi:10.15623/ijret.2014.0304071.
Google Scholar
Pei, D., M. Jing, H. Liu, F. Sun, and L. Jiang. 2020. A fast RetinaNet fusion framework for multi-spectral pedestrian detection. Infrared Physics & Technology 105:103178. doi:10.1016/j.infrared.2019.103178.
Web of Science ®Google Scholar
Redmon, J., S. Divvala, R. Girshick, and A. Farhadi. 2015. You only look once: Unified, real-time object detection. arXiv Preprint arXiv 1506: 02640.
Google Scholar
Redmon, J., and A. Farhadi. 2017. YOLO9000: better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 6517–25. doi:10.1109/CVPR.2017.690.
Google Scholar
Ren, S., K. He, R. Girshick, and J. Sun. 2017. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6):1137–49 doi:10.1109/TPAMI.2016.2577031.
PubMed Web of Science ®Google Scholar
Roboflow. 2021. BCCD dataset. Accessed February 2021. https://public.roboflow.com/object-detection/bccd/.
Google Scholar
Sandler, M., A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 4510–20. doi:10.1109/CVPR.2018.00474.
Google Scholar
Selvaraju, R. R., M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 618–26. doi:10.1109/ICCV.2017.74.
Google Scholar
Shakarami, A., M. Bagher Menhaj, A. Mahdavi-Hormat, and H. Tarrah. 2021. A fast and yet efficient YOLOv3 for blood cell detection. Biomedical Signal Processing and Control 66:102495. doi:10.1016/j.bspc.2021.102495.
Web of Science ®Google Scholar
Shao, M., P. He, Y. Zhang, S. Zhou, N. Zhang, and J. Zhang. 2023. Identification method of cotton leaf diseases based on bilinear coordinate attention enhancement module. Agronomy 13 (1):88. doi:10.3390/agronomy13010088.
Google Scholar
Sun, Y., J. Song, Y. Li, Y. Li, S. Li, and Z. Duan. 2023. IVP-YOLOv5: An intelligent vehicle-pedestrian detection method based on YOLOv5s. Connection Science 35 (1):2168254. doi:10.1080/09540091.2023.2168254.
Web of Science ®Google Scholar
Talukdar, K., K. Bora, L. B. Mahanta, and A. K. Das. 2022. A comparative assessment of deep object detection models for blood smear analysis. Tissue and Cell 76:101761. doi:10.1016/j.tice.2022.101761.
PubMed Web of Science ®Google Scholar
Tavakoli, S., A. Ghaffari, Z. Mousavi Kouzehkanan, and R. Hosseini. 2021. New segmentation and feature extraction algorithm for classification of white blood cells in peripheral smear images. Scientific Reports 11 (1):19428. doi:10.1038/s41598-021-98599-0.
PubMed Web of Science ®Google Scholar
Wang, C.-Y., A. Bochkovskiy, and H.-Y. Mark Liao. 2022. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv Preprint arXiv 2207:02696.
Google Scholar
Wang, H., D. Han, M. Cui, and C. Chen. 2023. NAS-YOLOX: A SAR ship detection using neural architecture search and multi-scale attention. Connection Science 35 (1):1–32. doi:10.1080/09540091.2023.2257399.
Web of Science ®Google Scholar
Wang, Q., B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu. 2020. ECA-Net: Efficient channel attention for deep convolutional neural networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 11534–42. doi:10.1109/CVPR42600.2020.01155.
Google Scholar
Woo, S., J. Park, J.-Y. Lee, and I. So Kweon. 2018. Cbam: Convolutional block attention module. arXiv Preprint arXiv 1807: 0652.
Google Scholar
Xia, T., Y. Q. Fu, N. Jin, P. Chazot, P. Angelov, and R. Jiang. 2020. AI-enabled microscopic blood analysis for microfluidic COVID-19 hematology. 2020 5th International Conference on Computational Intelligence and Applications (ICCIA), Beijing, China, 98–102. IEEE. doi:10.1109/ICCIA49625.2020.00026.
Google Scholar
Xu, F., X. Li, H. Yang, Y. Wang, and W. Xiang. 2022. TE-YOLOF: Tiny and efficient YOLOF for blood cell detection. Biomedical Signal Processing and Control 73:103416. doi:10.1016/j.bspc.2021.103416.
Web of Science ®Google Scholar
Yadav, A., and A. Yadav. 2021. An intelligent model for the detection of white blood cells using artificial intelligence. Computer Methods and Programs in Biomedicine 199:105893. doi:10.1016/j.cmpb.2020.105893.
PubMed Web of Science ®Google Scholar
Yang, X., J. Zhao, H. Zhang, C. Dai, L. Zhao, Z. Ji, and I. Ganchev. 2022. Remote sensing image detection based on YOLOv4 improvements. Institute of Electrical and Electronics Engineers Access 10:95527–38. doi:10.1109/ACCESS.2022.3204053.
Google Scholar
Yu, J., Y. Jiang, Z. Wang, Z. Cao, and T. Huang. 2016. Unitbox: An advanced object detection network. Proceedings of the 24th ACM International Conference on Multimedia, New York, NY, USA, 516–20. doi:10.1145/2964284.2967274.
Google Scholar
Zhang, Y.-F., W. Ren, Z. Zhang, Z. Jia, L. Wang, and T. Tan. 2022. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 506:146–57. doi:10.1016/j.neucom.2022.07.042.
Web of Science ®Google Scholar
Zhang, X., X. Zhou, M. Lin, and J. Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. arXiv Preprint arXiv 1707: 01083.
Google Scholar
Zhao, M., J. Kawahara, K. Abhishek, S. Shamanian, and G. Hamarneh. 2022. Skin3D: Detection and longitudinal tracking of pigmented skin lesions in 3D total-body textured meshes. Medical Image Analysis 77:102329. doi:10.1016/j.media.2021.102329.
PubMed Web of Science ®Google Scholar
Zhao, Z., N. Lv, R. Xiao, Q. Liu, and S. Chen. 2023. Recognition of penetration states based on arc sound of interest using VGG-SE network during pulsed GTAW process. Journal of Manufacturing Processes 87:81–96. doi:10.1016/j.jmapro.2022.12.034.
Web of Science ®Google Scholar
Zheng, Z., P. Wang, W. Liu, J. Li, R. Ye, and D. Ren. 2020. Distance-IoU loss: Faster and better learning for bounding box regression. arXiv Preprint arXiv 1911:08287.
Google Scholar
Zheng, Z., P. Wang, D. Ren, W. Liu, R. Ye, Q. Hu, and W. Zuo. 2021. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Transactions on Cybernetics 52 (8):8574–86. doi:10.1109/TCYB.2021.3095305.
Web of Science ®Google Scholar

DWS-YOLO: A Lightweight Detector for Blood Cell Detection

ABSTRACT

Introduction

Methods

Overview of DWS-YOLO

Table 1. Major differences between proposed detector DWS-YOLO and YOLOv5-nano.

Enhanced combined attention mechanism (ECAM)

Coordinate information embedding

Coordinate attention generation

Efficient channel attention generation

Improved lightweight C3 module (C3DWS)

Scylla intersection over union loss (SIoU)

Improved soft non-maximum suppression (Im-Soft-NMS)

Results

Training setting and hyper-parameter

Evaluation metrics

Ablation study

Attention mechanism

Table 2. The performance comparisons of employment of different attention modules (PASCAL VOC 2007 + 12 dataset).

Loss function

Table 3. The comparison of the effects of CIoU/EIoU/SIoU loss functions (GPU: RTX 3090; dataset: PASCAL VOC 2007 + 12).

Component analysis

Table 4. The single-factor ablation studies with four implements (GPU: RTX 3090; dataset: PASCAL VOC 2007 + 12).

Table 5. The multi-factor ablation studies with four implements (GPU: RTX 3090; dataset: PASCAL VOC 2007 + 12).

Comparison with previous blood cell detection works

Table 6. The result comparisons with previous blood cell detection works (GPU: RTX 3090; dataset: BCCD; platelets, RBC, and WBC represent the mAP of each category).

Comparison with state-of-the-art object detectors

Robustness analysis

Analysis of ROC curve

Comparison with the baseline in visualization heatmaps

Cross-domain experiment

Discussion

Conclusion

uaai_a_2318673_sm9021.bib

Disclosure Statement

Data Availability Statement

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date