394
Views
0
CrossRef citations to date
0
Altmetric
Research Article

AWDS-net: automatic whole-field segmentation network for characterising diverse breast masses

, , &
Article: 2289836 | Received 02 Jul 2023, Accepted 27 Nov 2023, Published online: 08 Jan 2024

Abstract

Diverse breast masses in size, shape and place make accurate image segmentation more challenging in a unified deep-learning network. Therefore, based on the U-net network, an adaptive automatic whole-field segmentation network (AWDS-net) for characterising diverse breast masses is proposed to assist more accurate and fast medical diagnosis in this paper. In the encoder part of AWDS-net, a small mass extraction mechanism (SMEM) is designed to better retain fine-grained small mass location information, while a spatial pyramid module (SPM) is added to capture multi-scale context and high-resolution image information. In the decoder part, an attention gate (AG) mechanism is inserted to make the model automatically focus on the useful target region information, so that the extracted feature information can be used to build a symmetric encoder-decoder structure for automatic segmentation network of multiple masses in the full field of view. The experimental results on an open-source breast cancer dataset digital database for mammography (DDSM) show that compared with U-net, Attention-Unet, R2U-Net, and SegNet, the proposed AWDS-net achieves, up to higher image segmentation metrics of 3.16% accuracy, 20.59% sensitivity, 5.23% specificity,10.27% precision, 15.08% IoU and 14.21% F1-score with acceptable training time.

1. Introduction

Breast cancer is known as the “pink killer”, which poses a great threat to women’s health. In 2020, approximately 2300,000 new cases were reported, among which 30% of patients died from this disease (Duffy et al., Citation2020). It is necessary for women to examine breasts regularly to achieve early detection and early treatment using early screening. Compared with computer tomography (CT), optical coherence tomography (OCT) and magnetic resonance imaging (MRI), mammography is proven the best choice for detecting breast cancer because it is much safer and faster (Ng & Muttarak, Citation2003). However, in mammography the complex features of diverse breast masses make detection time-consuming and the misdetection rate is up to 30% (Mulley et al., Citation2012). Computer-aided diagnosis (CAD) systems have been developed for fast and accurate breast cancer detection (Maqsood et al., Citation2022). Breast cancer is often influenced by the number, size and location of masses in breast tissues. Therefore, the accurate segmentation of masses in breast images is relatively significant in a breast cancer CAD system (Kamba et al., Citation2021). However, the current challenges of breast image segmentation come from the irregularities in shape, size, boundary and location. For example, there is difficult small mass characterisation due to the extra boundary refinement, variable size requiring multi-scale feature information extraction as well as the short or long distances between masses to focus on masses from irrelevant information.

With the rapid development of artificial intelligence techniques, deep learning methods are dominating in medical image segmentation (Litjens et al., Citation2017 and Di et al., Citation2022). The encoder-decoder network U-net proposed by Ronneberger et al. (Citation2015) has been successfully applied for many medical image segmentation to date. To improve the segmentation performance of the U-net network, various U-net variants such as R2U-net (Alom et al., Citation2018), Attention-Unet (Oktay et al., Citation2018), STAN (Shareef et al., Citation2020), Arf-net (Xu et al., Citation2022), Focal U-Net (Zhao et al., Citation2022), SCSGNet (Li et al., Citation2023), EU-net (Chowdary & Yoagarajah, Citation2023), AAU-Net (Chen et al., Citation2023), CFU-Net (Yin & Shao, Citation2023) and cascade feature extraction enhanced U-Net (Zarbakhsh, Citation2023) have been proposed.

All the above deep learning-assisted methods can improve the performance of image segmentation, but some problems are still unsolved well for the complex cases of multiple masses. One is the small masses characterisation, which often depends on complex designs such as two encoders. The other is the multi-scale feature information extraction of multiple masses with different sizes and shapes is not cost-effective and requires complex skip connections. More importantly, these works usually aim at addressing one of the challenges such as small masses, multi-scale features and global information consideration using U-net for medical image segmentation. To address these problems jointly in a cost-effective way, an Automatic Whole-fielD Segmentation network (AWDS-net) for characterising diverse breast masses is proposed to improve the segmentation quality in this paper. The main contributions include:

  • We design an enhanced U-net model for more accurate breast image segmentation and fast assisting doctors to screen and diagnose breast cancers. The proposed AWDS-net characterises the breast masses from size, shape and location accurately in whole-field images directly to avoid relocating the segmented region into original full-field images.

  • We conducted the ablation study to demonstrate the effectiveness of novel techniques in AWDS-net. The proposed AWDS-net adopts a small mass extraction mechanism (SMEM), and a spatial pyramid module (SPM) in the encoder to extract multi-scale feature information of diverse masses, while an attention gate (AG) scheme to focus on local masses information from global images in the decoder to enhance U-net framework seamlessly. The three techniques perform well jointly to improve the image segmentation quality, with up to 15.08% IoU and 14.21% F1-score increase.

  • We compare with the latest methods of medial image segmentation to analyse the results and speed qualitatively and quantitively. The proposed AWDS-net achieves higher image segmentation metrics, up to an increase of 3.16% accuracy, 20.59% sensitivity, 5.23% specificity, 10.27% precision, 15.08% IoU and 14.21% F1-score with acceptable training time.

The paper is organised as follows. Section 1 gives the introduction and Section 2 describes the related work. Section 3 details the proposed AWDS-net, while the comprehensive results and analysis is shown in Section 4. The conclusion is summarised in Section 5.

2. Related work

According to the image segmentation techniques, the mass segmentation of breast images is mainly divided into two branches: traditional image segmentation and image segmentation based on depth learning.

The traditional image segmentation methods include edge segmentation (Yang et al., Citation2016), region-based segmentation (Zebari et al., Citation2020), threshold-based segmentation (Guo et al., Citation2016) and others. These traditional image algorithms are not effective in dealing with multi-interference or complex images because some high-order features are often ignored. In comparison, deep learning algorithms have remarkable advantages in more high-order feature information extraction and effective segmentation automation (Litjens et al., Citation2017).

Due to the advantage of extracting high-order feature information, the deep learning method has currently dominated breast image segmentation. Since Cireşan et al. (Citation2013) used a convolution neural network (CNN) to process breast cancer images, CNN became popular in the medical image field. Particularly, Long et al. (Citation2015) proposed a full convolution neural network (FCN), which could accept the input images with variable size and classify the images at the pixel level. The Deeplab serials take advantage of FCN and further integrate a hollow convolution pyramid module so that the network can extract context information of different scales (Chen et al., Citation2017). More importantly, the encode–decode network U-net (Ronneberger et al., Citation2015) was proposed and many medical image segmentation approaches started to use the U-shape network structure for neural network models because its skip connections in the U-net network can integrate the underlying features and deep features.

Various U-net variants have been developed for higher segmentation quality. R2U-net (Alom et al., Citation2018) models mix the U-Net, residual network and recurrent convolutional neural network to utilise their power together. Attention-Unet uses a novel attention gate (AG) model to automatically learn the target structures of varying shapes and sizes (Oktay et al., Citation2018). STAN (Shareef et al., Citation2020) integrates both rich context information and high-resolution image features for a small tumour-aware network. Arf-net (Xu et al., Citation2022) aims at small masses characterisation using selective and multiple receptive fields for precise breast mass segmentation. Focal U-Net (Zhao et al., Citation2022) uses a focal self-attention block to improve the performance of breast lesion segmentation. AAU-Net (Chen et al., Citation2023) adopts a hybrid adaptive attention module to replace the traditional convolution operation. SCSGNet (Li et al., Citation2023) takes advantage of both global context extraction and local boundary refinement using three techniques series-parallel feature fusion, dynamic long-range correlation capture and triplet attention guide. EU-net (Chowdary & Yoagarajah, Citation2023) enhances the multi-scale feature extraction and fusion as well as the skip connection reconstruction to accurately segment breast masses. CFU-Net (Yin & Shao, Citation2023) uses two embedded U-Nets and designs a multilevel attention module (MLAM) to execute the multilevel information interaction. The cascade feature extraction enhanced U-Net combines a spatial attention mechanism to characterise the subtle features of breast tumours well (Zarbakhsh, Citation2023).

Even though the above U-Net-based works have been successfully applied for medical image segmentation, they still have some limitations. Firstly, the U-Net method performs worse in small masses characterisation, which often requires complex two encoders (Shareef et al., Citation2020) and receptive field modules (Xu et al., Citation2022). Secondly, the multi-scale feature information extraction in the encoder mostly needs an extra skip connection (Chowdary & Yoagarajah, Citation2023) so that the different masses can be characterised effectively. Additionally, these U-Net-based works often aim at addressing one problem or partial challenges of small masses identification, multi-scale feature extraction and global information consideration for medical image segmentation. How to solve these problems cost-effectively is still pending. In this paper, we propose AWDS-net to use the three modules of SMEM, SPM and AG to reconstruct the effective encoder-decoder U-shaped structure for accurate and fast breast image segmentation.

3. Proposed AWDS-net

3.1. Framework of AWDS-net

Aiming at the task of whole-field segmentation of multiple masses in breast molybdenum target images, a novel deep neural network AWDS-net based on U-net is designed by the new encoder and decoder, as shown in Figure .

Figure 1. Overall AWDS-net structure. The AWDS-net is enhanced by SMEM, SPM and AG schemes based on U-net jointly.

Figure 1. Overall AWDS-net structure. The AWDS-net is enhanced by SMEM, SPM and AG schemes based on U-net jointly.

In the encoder module of AWDS-net, the basic max-pooling layer in the original U-Net can preserve the main features and reduce a large number of parameters in the downsampling path. The new modules SMEM and SPM are both inserted to extract more local mass information and global breast information. In the new decoder module, the existing unpooling layers in the original U-Net can continue the basic upsampling procedure. The new part of the decoder is the added AG modules to highlight the mass information and suppress irrelevant areas from the whole-field vision images in the upsampling path. So AWDS-net can be used to segment a variety of multiple masses in a mammography image.

AWDS-net adopts a symmetric U-shaped encoder-decoder structure. In Figure , 64, 128, 256, 512 and 1024 represent the number of channels in each layer, and the grey horizontal arrows represent skipping connections. Once the pre-processed data set is fed to the network structure, it first goes through the convolution kernel of two 3 × 3 plus ReLu operations like the original U-net followed by the extra SMEM and SPM modules. Each SMEM module is composed of 1 × 1 convolution kernel plus ReLu, 3 × 3 convolution kernel plus ReLu and 5 × 5 convolution kernel plus ReLu parallelly, fully retaining the characteristic information of multiple different masses. Then, SPM is accessed, which is composed of cascade cavity convolution modules with voids of 1, 6, 12 and 18 respectively, as well as effectively fuses the multi-scale information of multiple breast masses of different sizes. The yellow region 1 represents the obtained 32 × 32 feature map after downsampling in the encoder, while the yellow region 2 is the 28 × 28 feature map for the following sampling in the decoder. An AG module is added before each uppooling layer to make the network focus on the mass information, which is helpful in recovering and extracting the details of masses. Finally, the feature image is up-sampled by the decoder and the segmented breast masses are obtained.

3.2. SMEM module

There are great differences in the size of breast masses in different patients, and the sizes of breast masses in the same individual are also different. The proposed U-net for medical image segmentation should be improved for small masses. Inspired by the multiple convolution kernels in a small tumour-aware network (STAN, Shareef et al., Citation2020), we design a new SMEM module to enhance the encoder to extract more information about small masses. The primary difference is that STAN just uses multiple convolution kernels in two encoder branches five times to extract small masses information, while our AWDS-net adopts the multiple convolution kernels with ReLU as SMEM for four layers of an encoder and cooperate the following SPM for multi-scale information extraction together to retain the fine-grained location information for further processing.

SMEM uses three convolution kernels of different sizes in each convolution layer of the encoder to construct the feature graph in Figure . The convolution kernel size is 1 × 1, 3 × 3 and 5 × 5, and only one is needed for each size. SMEM is the summation of the information extracted by the three convolutions of 1 × 1,3 × 3 and 5 × 5 to extract the information of small masses. To solve the problem that the number of image channels in the neural network decreases with the downsampling, the convolution of 1 × 1 is used in the structure of SMEM. At the same time, the ReLU layer is also used to further enhance the non-linear representation of AWDS-net. The larger the convolution kernel, the larger the receptive field, the more image information the neural network can see, the better the features. However, too large a convolution core can also lead to a decrease in computing performance. Therefore, AWDS-net adopts the convolution kernel combination form of 1 × 1, 3 × 3 and 5 × 5 in Figure .

Figure 2. Structure of SMEM. The inserted SMEM considers the variable convolution kernels to characterise the different mass sizes.

Figure 2. Structure of SMEM. The inserted SMEM considers the variable convolution kernels to characterise the different mass sizes.

With the cooperation of the following SPM, more receptive fields of different sizes are obtained. In this way, it not only retains the detailed information of small masses but also does not waste too much computational power, and has a better effect on the segmentation of full-field molybdenum target images of different sizes of masses. Otherwise, if SPM is placed before SMEM, the small mass information can be handled cost-effectively to achieve multiple scale masses extraction from the breast mass tissue very well.

3.3. SPM module

To identify and segment multiple masses of variable size in breast images, cavity convolution with different cavity rates is used to extract multi-scale features of images so that the segmentation model can perceive masses of variable sizes through different receptive fields and provide output images through a parallel spatial pyramid module well. The ASPP mechanism (Wang et al., Citation2020) can effectively identify breast masses of different sizes, which not only ensures the local information and details but also has a good effect on the segmentation of large target masses, avoiding the use of a single convolution to extract information that will lead to discontinuity of the extracted information.

Inspired by the ASPP design (Wang et al., Citation2020), we try to use a similar SPM module composed of cavity convolution with dilation rates of 1, 6, 12 and 18 in Figure . Cavity convolution with a dilation rate of 1 can retain the details of small masses. Cavity convolution with dilation rates of 6, 12 and 18 can provide more global information for small- and medium-sized masses and large target masses. The cavity convolution dilation rate set to 1 is equivalent to the ordinary convolution of 3 × 3, the cavity convolution dilation rate set to 6 is equivalent to the ordinary convolution of 23 × 23, the cavity convolution dilation rate set to 12 is equivalent to the ordinary convolution of 47 × 47 and the cavity convolution dilation rate set to 18 is equivalent to the ordinary convolution of 71 × 71. EquationEq. (1) is used to determine the receptive field (RF) by padding convolution, where dr is the dilation rate and k is the convolution kernel size. For example, if the dilation rate dr is set to 18 and the convolution kernel size is 3, we can obtain RF equal to 71 using 2* (18-1)*(3-1) + 3. (1) RF=2(dr1)(k1)+k(1) This design can obtain the characteristic information of masses on multiple scales, which is beneficial to the segmentation of masses of different sizes in mammography images. The operation process of void convolution can be expressed as (2) yi=kx(i+rk)ω(k)(2) where x represents the convolution input, r represents the expansion rate, ω represents the convolution kernel, k represents the size of the convolution kernel and i depicts the location in the input and output.

Figure 3. The structure of SPM uses cavity convolution with different cavity rates inspired by Wang et al. (Citation2020) to extract multi-scale breast masses features of images in the proposed AWDS-net.

Figure 3. The structure of SPM uses cavity convolution with different cavity rates inspired by Wang et al. (Citation2020) to extract multi-scale breast masses features of images in the proposed AWDS-net.

The feature information flows from the SMEM module to the SPM module. Firstly, the cavity convolution of a rate equal to 1 can extract the breast molybdenum target mass information. As a part of the output of the final SPM module, the results are stitched with the feature images extracted by the cavity convolution of rate equal to 12 and 18, and the features in different visual fields are obtained. Cavity convolution can reduce the loss of spatial features without reducing the receptive field but may lose continuous information such as edges, which is bad for the segmentation of small masses, so we add the SMEM mechanism before SPM to supplement the feature information such as small masses and edge details in series with the SPM model, so that the detailed multi-scale information can be extracted to improve the segmentation quality.

3.4. AG module

The attention gate mechanism is also used in image segmentation to distinguish the importance of different feature images (Oktay et al., Citation2018). Since multiple masses have different sizes, this AG module is added to activate useful features and suppress unrelated areas for better low-level features and high-level features, thus improving the segmentation quality and agility of the model.

The attention gate diagram used in our proposed AWDS-net is shown in Figure . Based on U-net, where l represents the number of calculation layers of the network, i represents the size of pixel space. xl and gi are the input feature map signal and the selected channel signal, respectively.wg represents the parameter matrix of the input image of the breast lump and wx represents the weight coefficient matrix of the gate signal. σ1 represents the rectified linear unit (ReLU) activation function,σ2 represents the sigmoid activation function. αl represents the gating coefficient of the attention mechanism and 0αl1. φ is the weight parameter vector of 1 × 1 convolution. Therefore, the expression of the attention mechanism is as follows in EquationEq. (3): (3) AGl=σ2φ[σ1(giWg+xlWx+b)](3) where b is the bias, σ1=max(0,xil) is used to increase the nonlinear characteristics, σ2=11+exp(xi) to prevent too sparse characteristics.

Figure 4. AG structure. AWDS-net uses attention gate mechanisms (Oktay et al., Citation2018) to improve the decoder by focusing on the information of masses.

Figure 4. AG structure. AWDS-net uses attention gate mechanisms (Oktay et al., Citation2018) to improve the decoder by focusing on the information of masses.

The AG module assists the proposed AWDS-net model in highlighting the features of breast masses information from breast tissues very well.

4. Experiments and analysis

4.1. Experimental settings and metrics

We used the images of malignant masses from the cancer category of the open-source dataset Digital Database for Screening Mammography (DDSM) (Heath et al., Citation2000) and conducted data augmentation to input the segmentation models. Doctor Li, our co-author helped to select 200 malignant masses from the DDSM dataset for study. The selected 200 breast images can cover various masses distribution such as a single big mass, a small mass, multiple small masses and multiple masses mixed with a big mass and several small masses. They were rotated randomly within 0–180 degrees with the scaling ratio over original images between 0 and 0.2. So the original dataset is extended to about 1000 pieces by random scaling, from which 800 pieces were chosen to be consistent with the pathological characteristics of the masses. This preprocessing can make the masses features in different directions and improve the ability of the model to extract multi-scale features well. Then, the image size is adjusted to the unified 512 × 512. The pre-processed DDSM is divided into training set, verification set and test set, while the ratio is set to 3:1:1. The PyTorch framework is used in the experiments. Based on the experimental comparison, the important parameters such as optimiser, learning rate, batch size and number of iterations are tuned for higher segmentation performance of breast images.

The commonly used optimisers are SGD, Momentum and Adam and the Adam algorithm consists of two parts: RMSProp algorithm and revised Momentum. If other experimental parameters remain unchanged, the model performs best on the Adam optimiser through experimental comparison in Figure (a). Therefore, the Adam optimiser is selected for training in the experiment. momentum1 and momentum2 in Adam are set to 0.9 and 0.999, respectively.

Figure 5. Varying parameters impact the accuracy and speed of AWDS-net image segmentation. (a) Varying optimiser impacts on accuracy. (b) Varying learning rate impacts on accuracy. (c) Varying learning rate impacts training time.

Figure 5. Varying parameters impact the accuracy and speed of AWDS-net image segmentation. (a) Varying optimiser impacts on accuracy. (b) Varying learning rate impacts on accuracy. (c) Varying learning rate impacts training time.

The neural network model mainly controls the gradient adjustment speed of the loss function through the learning rate, and the initial value of the learning rate is an important hyperparameter that affects the training effect of the model. Through the comparison of experimental results, the learning rate is initialised to 0.001 for the highest accuracy in Figure (b). We can also conclude that the accuracy of the model will not change after the number of experimental iterations reaches 150, so the number of training iterations is finally set to 150, to save computer resources and training time in Figure (c).

The loss function estimates the gap between the Segmentation Results (SR) of the model and the Ground Truth (GT). The smaller the value, the better the robustness of the model. The Loss function used in training the model in this experiment is Dice Loss. Unlike the weighted loss function, it does not require reweighting classification for unbalanced segmentation tasks. Dice Loss is defined in EquationEq. (4): (4) Dice Loss=12|XY||X|+|Y|(4) where Y represents the predicted image and X represents the real image. The larger the Dice Loss, the larger the gap between the predicted value and the real value, and other optimisation of the model is required.

In this experiment, the following general metrics in medical image segmentation are used to verify the effectiveness of the proposed AWDS-net: accuracy (ACC), sensitivity (SE), specificity (SP), precision (PPV), IoU (intersection over union, also known as Jaccard index), F1-score (F1, also denoted as Dice coefficient for image segmentation, can use PPV and SP calculations for joint estimation). The definitions of these general metrics are as follows: (5) ACC=TP+TNTP+TN+FP+FN(5) (6) SE=TPTP+FN(6) (7) SP=TNTN+FP(7) (8) PPV=TPTP+FP(8) (9) IoU=TPTP+FP+FN(9) (10) F1=2PPVSEPPV+SE=2TP2TP+FP+FN(10) where positive categories are the pixels in the masses region and negative categories are out of the mass’s region, true-negative (TN) denotes the number of pixels that are correctly divided into negative categories; false-negative (FN) indicates the number of pixels that are mistakenly divided into negative categories. True-positive (TP) is the number of pixel units that correctly classify the lesion into positive categories; False-positive (FP) represents the number of pixel units that mistakenly classify the background into positive categories.

4.2. Ablation study

The metrics described in section 4.1 are used to evaluate the segmentation performance improvements of three techniques in the proposed model AWDS-net, and the experimental results of the ablation study are shown in Table . U-net-S is the U-net with SMEM module, U-net-SG is the U-net with SMEM module and AG mechanism, U-net-P is the U-net with SPM module, U-net-SP shows the U-net with SMEM module and SPM module. AWDS-net is the overall model.

Table 1. Comparison of experimental results of improved network for AWDS-net ablation study.

These five enhanced U-Net networks with three techniques of SMEM, SPM and AG, as Table  shows, in which black bold font is the optimal value of each index in this comparative experiment. Through the experimental results in the table, we can conclude that the final AWDS-net has excellent performance and achieve the maximum value among five improved network using accuracy, sensitivity specificity, IoU and F1-Score. Firstly, the SMEM-based U-net-S can characterise the small masses well with higher IoU 0.4521 and F1 0.6226 than U-net IoU 0.4349 and F1 0.6061, respectively. However, it only considers the small masses so that other larger masses are ignored and the overall accuracy is a bit lower than U-net. Then, the mixture of SMEM and AG in U-net-SG further considers the local mass information and global breast information together and outperforms U-net and U-net with a single SMEM mechanism with better values of metrics. Similarly, SPM for U-net-P and the mixture of SPM and SMEM for U-net-SP are also evaluated. SPM can bring higher performance because it improves multi-scale mass information extraction. Interestingly, U-net-SP has better precision, IoU and F1 but lower accuracy, sensitivity and specificity. The main reason is that the mixed U-net-SP ignores the small masses and only improves partial image segmentation. Therefore, it is necessary to combine the three mechanisms for more accurate image segmentation.

Interestingly, it is observed that U-net-SP has the highest PPV while other metrics are lower. This combination with SMEM and SPM focuses on the encoder optimisation for more information extraction so that the PPV can achieve high value while no attention mechanism to focus on the masses from breast tissues makes the lower accuracy, SE and SP.

The experimental results demonstrate that AWDS-net makes good use of the three modules AG, SMEM and SPM, which have respective improvements for each module. Therefore, AWDS-net is a good choice for breast image segmentation using the three modules to enhance multi-scale information extraction and focus on the breast masses from whole-field images.

This study focuses on the whole-field segmentation of multiple masses of different sizes. To evaluate the effect and segmentation results of the model more intuitively, four images, including small masses, large masses, multiple small masses and multiple large masses in mammography images are selected to present the qualitative results, as shown in Figures , respectively. From left to right and from top to bottom are the segmentation result images with real tags and five improved networks.

Figure 6. Segmentation comparison of a single small mass. (a) Tagged original image. (b) U-net-S Segmentation result. (c) U-net-SG Segmentation result. (d) U-net-P Segmentation result. (e) U-net-SP Segmentation result. (f) Proposed AWDS-net Segmentation result.

Figure 6. Segmentation comparison of a single small mass. (a) Tagged original image. (b) U-net-S Segmentation result. (c) U-net-SG Segmentation result. (d) U-net-P Segmentation result. (e) U-net-SP Segmentation result. (f) Proposed AWDS-net Segmentation result.

Figure 7. Segmentation comparison of a large mass. (a) Tagged original image. (b) U-net-S Segmentation result. (c) U-net-SG Segmentation result. (d) U-net-P Segmentation result. (e) U-net-SP Segmentation result. (f) Proposed AWDS-net Segmentation result.

Figure 7. Segmentation comparison of a large mass. (a) Tagged original image. (b) U-net-S Segmentation result. (c) U-net-SG Segmentation result. (d) U-net-P Segmentation result. (e) U-net-SP Segmentation result. (f) Proposed AWDS-net Segmentation result.

Figure 8. Segmentation comparison of multiple small masses. (a) Tagged original image. (b) U-net-S Segmentation result. (c) U-net-SG Segmentation result. (d) U-net-P Segmentation result. (e) U-net-SP Segmentation result. (f) Proposed AWDS-net Segmentation result.

Figure 8. Segmentation comparison of multiple small masses. (a) Tagged original image. (b) U-net-S Segmentation result. (c) U-net-SG Segmentation result. (d) U-net-P Segmentation result. (e) U-net-SP Segmentation result. (f) Proposed AWDS-net Segmentation result.

Figure 9. Segmentation comparison of multiple large masses. (a) Tagged original image. (b) U-net-S Segmentation result. (c) U-net-SG Segmentation result. (d) U-net-P Segmentation result. (e) U-net-SP Segmentation result. (f) AWDS-net Segmentation result.

Figure 9. Segmentation comparison of multiple large masses. (a) Tagged original image. (b) U-net-S Segmentation result. (c) U-net-SG Segmentation result. (d) U-net-P Segmentation result. (e) U-net-SP Segmentation result. (f) AWDS-net Segmentation result.

Through the segmentation results of a small mass shown in Figure , the network with the added SMEM mechanism is effective in identifying small masses, while U-net-P without adding the SMEM mechanism does not recognise small masses. Networks with Attention mechanisms such as U-net-SG and AWDS-net can identify masses and backgrounds more accurately.

The results of large mass segmentation are shown in Figure , and the AWDS-net is the best. The comparative results of multiple small masses are shown in Figure . U-net-S and U-net-SG show the recognition effect of small masses after adding attention mechanism and SMEM mechanism based on U-net, but in the case of missed detection of multiple masses, both U-net-P and U-net-SP adopt SPM mechanism based on U-net, which can detect multiple masses, effectively supplement the feature information extraction of small masses from multiple scales, and make up for the missed detection of different masses. AWDS-net adds SMEM, SPM and attention mechanism to U-net and the results show that the segmentation result of AWDS-net is closer to the real tag.

The segmentation comparison of different networks for multiple large masses is shown in Figure . U-net-S and U-net-SG pairs lack SPM modules and can’t identify multiple large masses. After adding the SPM mechanism, U-net-P, U-net-SP and AWDS-net can improve the segmentation quality of multiple large masses.

4.3. Better performance than state-of-the-art

To evaluate the performance of the proposed AWDS-net, we compare the new network model with several advanced image segmentation network models based on deep learning, including U-Net (Ronneberger et al., Citation2015), Attention-Unet (Oktay et al., Citation2018), SegNet (Badrinarayanan et al., Citation2017), R2U-Net (Alom et al., Citation2018) and SCSGNet (Li et al., Citation2023), which are typical encoder-decoder network structures. The experimental results are shown in Table .

Table 2. Comparison of experimental results of different segmentation methods.

The proposed model AWDS-net takes advantage of the popular U-net structure and enhances the encode-decoder by characterising diverse masses in breast images for the whole field automatic segmentation. Compared with U-net and other networks, AWDS-net significantly improved the value of each index, and its segmentation performance was second only to Attention-Unet with a small difference. The results of AWDS-net show that the metrics of accuracy, sensitivity, specificity, precision, IoU and F1-Score are 0.9926, 0.8269, 0.9976, 0.6104, 0.5412 and 0.7023, respectively. Compared with these methods, the accuracy, sensitivity, specificity, precision, IoU and F1-Score improvement of the proposed AWDS-net is up to 0.14% ∼ 3.16%, 4.67% ∼ 20.59%, and 0.16% ∼ 5.23%, 4.17% ∼ 10.27%, 5.1∼15.08%, 4.45%∼14.21%, respectively.

The improvements of segmentation quality from our proposed AWDS-net rely on the joint three modules of SMEM characterising the small masses well, SPM improving multi-scale mass information extraction and AG considering the local mass information and global breast information together in an effective way. For example, the basic U-net and SegNet ignore the multi-scale mass information extraction while Attention-Unet and R2U-Net cannot characterise small mass very well. Therefore, the proposed AWDS-net can outperform precise mass segmentation over the advanced methods.

To compare with the latest works Unet++ (Zhou et al., Citation2019), TransUNet (Chen et al., Citation2021), Arf-net (Xu et al., Citation2022), SCSGNet (Li et al., Citation2023) and consider the differences of datasets, we use U-net as the normalised 1 to align with state-of-the-art (SOTA) methods indirectly. The advanced SCSGNet combines global context extraction with local boundary refinement for diverse masses and ambiguous boundaries. The general metrics such as SE (sensitivity), SP (specificity), and F1 (Dice coefficient) of the proposed AWDS-net and four SOTA methods are listed in Figure . It is noted that as for the improvement of F1 score, sensitivity, and specificity. There is a nearly little gap between the proposed AWDS-net and advanced methods on specificity, while our proposed AWDS-net brings higher F1-score and sensitivity due to three additional modules in U-net. Arf-net considers the small masses by complex adaptive receptive fields without an effective attention mechanism in the decoder while SCSGNet is also more complex with more space to store the map information, as well as more time iteratively updating the feature map and refining the boundary. Therefore, AWDS-net is a cost-effective model to segment diverse masses.

Figure 10. Relative comparison of the proposed AWDS-net with SOTA methods by aligning with the basic U-net.

Figure 10. Relative comparison of the proposed AWDS-net with SOTA methods by aligning with the basic U-net.

4.4. Speed limitations over state-of-the-art

Besides the segmentation performance, the time cost and required space are also very important for breast tumour segmentation. Compared with the often-used deep learning segmentation networks, the proposed AWDS-net achieve higher performance quality at the acceptable expense of a bit longer training time and larger model size, as shown in Table . Even though the model size increases by 30% than basic U-net, the training time remains around 2 h. Therefore, AWDS-net is suitable for the segmentation of multiple masses in the whole field of vision, and it is acceptable to sacrifice some training time in exchange for higher segmentation performance.

Table 3. Comparison of training time and model size of different segmentation methods.

4.5. Discussion

Based on the ablation study, three new modules SMEM, SPM and AG in the AWDS-net perform effectively for breast image segmentation. More importantly, the intuitive segmentation results and quantitative results both demonstrate their benefits of improving the segmentation performance under various masses.

Furthermore, AWDS-net can take advantage of SMEM and SPM to characterise multi-scale mass information and AG to focus on the mass-relevant information from global breast images. Therefore, it can achieve higher performance on breast mass segmentation than state-of-the-art methods from the metrics comparison such as SE, SP, PPV, IoU and F1.

Besides the higher segmentation performance of our proposed AWDS-new, it should be noted that the limitations of the proposed work AWDS-net lie in two aspects. On the one hand, the new modules of SMEM, SPM and AG in the proposed AWDS-net improve the segmentation quality but also bring longer training time and a larger model. On the other hand, the segmentation work has not been verified by classification results in a complete diagnosis flow. Therefore, these limitations of performance-speed trade-off and more complete verification will be done in future work.

5. Conclusion

In this paper, we propose a whole-field automatic segmentation network AWDS-net which adds the new SMEM and SPM modules to the U-net encoder to capture the scalable and refined information of small masses and multiple scales so that the network can effectively identify and segment multiple masses in the same segmented image. Importantly, an AG module is combined to enhance the network’s attention to the target mass and increase the accuracy metrics of the image segmentation relatively. The comprehensive experiments and ablation study on the open-source breast cancer dataset DDSM demonstrate the effectiveness of our proposed method AWDS-net over the advanced works on accurate and fast breast image segmentation. The AWDS-net can help doctors to diagnose breast cancer accurately and fast to improve the survival probability of patients.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This research was funded by Shanghai Pujiang Program numbered 21PJD026.

Notes on contributors

Jiajia Jiao

Jiajia Jiao is an associate professor at Shanghai Maritime University and her research interests include machine learning-assisted medical image analysis and computer optimisation.

Yingzhao Chen

Yingzhao Chen was an M.S. student at Shanghai Maritime University and her research interests include machine learning-assisted medical image analysis.

Zhiyu Li

Zhiyu Li is a doctor at Shanghai East Hospital and Tongji University School of Medicine. Her research interest focuses on medical image processing and analysis.

Tien-Hsiung Weng

Tien-Hsiung Weng is with Providence University and his research interests include machine learning-assisted applications.

References

  • Alom, M. Z., Hasan, M., Yakopcic, C., Taha, T. M., & Asari, V. K. (2018). Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation. arXiv preprint arXiv:1802.06955. https://doi.org/10.48550/arXiv.1802.06955.
  • Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
  • Chen, G., Li, L., Dai, Y., Zhang, J., & Yap, M. H. (2023). AAU-Net: An adaptive attention U-Net for breast lesions segmentation in ultrasound images. IEEE Transactions on Medical Imaging, 42(5), 1289–1300. https://doi.org/10.1109/TMI.2022.3226268
  • Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A. L., & Zhou, Y. (2021). Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306. https://doi.org/10.48550/arXiv.2102.04306
  • Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. https://doi.org/10.1109/TPAMI.2017.2699184
  • Chowdary, G. J., & Yoagarajah, P. (2023). EU-Net: Enhanced U-shaped network for breast mass segmentation. IEEE Journal of Biomedical and Health Informatics, 1–11. https://doi.org/10.1109/JBHI.2023.3266740
  • Cireşan, D. C., Giusti, A., Gambardella, L. M., & Schmidhuber, J. (2013, September 22–26). Mitosis detection in breast cancer histology images with deep neural networks. Medical image computing and computer-assisted intervention–MICCAI 2013: 16th international conference, Nagoya, Japan, September 22-26, 2013, Proceedings, Part II 16. 411–418. https://doi.org/10.1007/978-3-642-40763-5_51
  • Di, J., Ma, S., Lian, J., & Wang, G. (2022, January 15–16 ). A U-Net network model for medical image segmentation based on improved skip connections. 14th international conference on measuring technology and mechatronics automation (ICMTMA), Changsha, People’s Republic of China, pp. 298–302. https://doi.org/10.1109/ICMTMA54903.2022.00064
  • Duffy, S. W., Tabár, L., Yen, A. M. F., Dean, P. B., Smith, R. A., Jonsson, H., Törnberg, S., Chen, S. L., Chiu, S. Y., Fann, J. C., Ku, M. M., Wu, W. Y., Hsu, C., Chen, Y., Svane, G., Azavedo, E., Grundström, H., Sundén, P., Leifland, K., … Chen, T. H. (2020). Mammography screening reduces rates of advanced and fatal breast cancers: Results in 549,091 women. Cancer, 126(13), 2971–2979. https://doi.org/10.1002/cncr.32859
  • Guo, Y. N., Dong, M., Yang, Z., Gao, X., Wang, K., Luo, C., Ma Y., & Zhang, J. (2016). A new method of detecting micro-calcification clusters in mammograms using contourlet transform and non-linking simplified PCNN. Computer Methods and Programs in Biomedicine, 130, 31–45. https://doi.org/10.1016/j.cmpb.2016.02.019
  • Heath, M., Bowyer, K., Moore R., & Philip Kegelmeyer W. (2000, June 11–14). The digital database for screening mammography. Proceedings of the 5th international workshop on digital mammography, 212–218.
  • Kamba, M., Manabe, M., Wakamiya, S., Yada, S., Aramaki, E., Odani, S., & Miyashiro, I. (2021). Medical needs extraction for breast cancer patients from question and answer services: Natural language processing-based approach. JMIR Cancer, 7(4), e32005. https://doi.org/10.2196/32005
  • Li, Q., Xu, J., Yuan, R., Zhang, Y., & Feng, R. (2023, June 04-10). Scsgnet: Spatial-correlated and shape-guided network for breast mass segmentation. ICASSP 2023-2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), 1–5. https://doi.org/10.1109/ICASSP49357.2023.10096410
  • Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., van der Laak, J. A. W. M., van Ginneken, B., & Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60–88. https://doi.org/10.1016/j.media.2017.07.005
  • Long, J., Shelhamer, E., & Darrell, T. (2015, June 07-12). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition, 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965
  • Maqsood, S., Damaševičius, R., & Maskeliūnas, R. (2022). TTCNN: A breast cancer detection and classification towards computer-aided diagnosis using digital mammography in early stages. Applied Sciences, 12(7), 3273. https://doi.org/10.3390/app12073273
  • Mulley, A. G., Trimble, C., & Elwyn, G. (2012). Stop the silent misdiagnosis: Patients’ preferences matter. Bmj, 345(nov07 6), e6572–e6572. https://doi.org/10.1136/bmj.e6572
  • Ng, K. H., & Muttarak, M. (2003). Advances in mammography have improved early detection of breast cancer. Journal-Hong Kong College of Radiologists, 6, 126–131.
  • Oktay, O., Schlemper, J., Folgoc, L. L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N. Y., Kainz, B., Glocker, B., & Rueckert, D. (2018). Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999. https://doi.org/10.48550/arXiv.1804.03999.
  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 234–241). https://doi.org/10.1007/978-3-319-24574-4_28
  • Shareef, B., Xian, M., & Vakanski, A. (2020, April 03–07). Stan: Small tumor-aware network for breast ultrasound image segmentation. 2020 IEEE 17th international symposium on biomedical imaging (ISBI), 1–5. https://doi.org/10.1109/isbi45749.2020.9098691
  • Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., & Hu, Q. (2020, June 13-19). ECA-Net: Efficient channel attention for deep convolutional neural networks. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11534–11542. https://doi.org/10.1109/CVPR42600.2020.01155
  • Xu, C., Qi, Y., Wang, Y., Lou, M., Pi, J., & Ma, Y. (2022). ARF-Net: An adaptive receptive field network for breast mass segmentation in whole mammograms and ultrasound images. Biomedical Signal Processing and Control, 71, 103178. https://doi.org/10.1016/j.bspc.2021.103178
  • Yang, Z., Dong, M., Guo, Y., Gao, X., Wang, K., Shi, B., & Ma, Y. (2016). A new method of micro-calcifications detection in digitized mammograms based on improved simplified PCNN. Neurocomputing, 218, 79–90. https://doi.org/10.1016/j.neucom.2016.08.068
  • Yin, H., & Shao, Y. (2023). CFU-Net: A coarse-fine U-Net with multi-level attention for medical image segmentation. IEEE Transactions on Instrumentation and Measurement, 72, 1–12. https://doi.org/10.1109/TIM.2023.3293887
  • Zarbakhsh, P. (2023). Spatial attention mechanism and cascade feature extraction in a U-Net model for enhancing breast tumor segmentation. Applied Sciences, 13(15), 8758. https://doi.org/10.3390/app13158758
  • Zebari, D. A., Zeebaree, D. Q., Abdulazeez, A. M., Haron, H., & Hamed, H. N. A. (2020). Improved threshold based and trainable fully automated segmentation for breast cancer boundary and pectoral muscle in mammogram images. Ieee Access, 8, 203097–203116. https://doi.org/10.1109/ACCESS.2020.3036072
  • Zhao, H., Niu, J., Meng, H., Wang, Y., Li, Q., & Yu, Z. (2022, July 11–15). Focal U-Net: A focal self-attention based U-Net for breast lesion segmentation in ultrasound images. 44th annual international conference of the IEEE engineering in medicine & biology society (EMBC), Glasgow, Scotland, United Kingdom, 1506–1511. https://doi.org/10.1109/EMBC48229.2022.9870824
  • Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N., & Liang, J. (2019). Unet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging, 39(6), 1856–1867. https://doi.org/10.1109/TMI.2019.2959609