5,645
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Fanet: A deep learning framework for black and odorous water extraction

ORCID Icon, ORCID Icon, , , &
Article: 2234077 | Received 15 Apr 2022, Accepted 04 Jul 2023, Published online: 14 Jul 2023

ABSTRACT

Black and odorous water (BOW) is a common issue in rapidly urbanizaing developing countries. Existing methods for extracting BOW from remote sensing images focus mainly on spectral information and ignores important spatial characteristics like texture, context and orientation. Deep learning has emerged as a powerful approach for BOW extraction, but its effectiveness is hindered by limited amount of labeled data and a small proportion of objects. In this paper, we proposed a fully convolutional adversarial network (FANet) for end-to-end pixel-level semantic segmentation of BOW.. FANet combines a fully convolutional network (FCN) with a larger receptive field and perceptual loss, and employs adversarial learning to enhance stability in the absence of sufficient data labels. The Normalized Difference BOW Index, which can reflect the higher spectral reflectance of BOW in the near-infrared band, is used as the input of FANet together with RGB. In addition, we create a standard BOW dataset containing 5100 Gaofen-2 of 224 × 224 pixels. Evaluation of FANet on BOW dataset using intersection over union and F1-score demonstrates its superiority over popular models like FCN, U-net, and Segnet. FANet successfully preserves the integrity, continuity, and boundaries of BOW, achieving superior performance in both quantity and quality of BOW extraction.

Introduction

Black and odorous water (BOW) is a general term for water with unpleasant colors and foul odors (Cheng et al., Citation2006). The development of urban industry has resulted in the discharge of industrial and domestic wastewater, affecting urban rivers. Thus, the severity of the problem of BOW is increasing. The first black and odorous rivers were reported in China in the 1980s, in the Huangpu River in Shanghai (Summers, Citation1986). Since then, BOW has been observed in the Harbin and Songhua Rivers (Zhang et al., Citation2005). Nowadays, in China, BOW is becoming increasingly widespread and is observed in most cities (Yuan et al., Citation2020). Therefore, the control of BOW is now extensively supported by the government and has become a crucial part of urban environmental improvement.

Macroscopic monitoring and identification are prerequisites for governance (S. Chen et al., Citation2021). BOW extraction can be divided into three main methods: water sampling and chemical analysis (Cao et al., Citation2020), remote sensing image classification based on spectral characteristics (H. Yao et al., Citation2019), and deep learning (Shao et al., Citation2021). Among these methods, water sampling and chemical analysis are the most commonly used methods. Early research on abnormal water has used this method to analyze “black” water, such as “black water mass” (Nichol, Citation1993a, Citation1993b), “black algae mass” (C. Hu et al., Citation2003, Citation2004), and “lake flooding”. However, considering the cost of manpower, material, and financial resources, and the special limitations of the fixed-point sampling mode, water sampling, and chemical analysis cannot fulfill the needs of timely and large-scale BOW monitoring.

With the availability of remote sensing products at home and abroad, remote sensing technology has gradually become an effective BOW extraction technique based on spectral features, owing to its low cost, high efficiency, and continuous spatial characteristics (Huang & Zheng, Citation2019; Wang et al., Citation2012). With the in-depth study of abnormal water, the formation mechanism and optical properties of black water have gradually attracted attention. Spectral characteristics lead to certain differences in the spectral curve and other apparent optical characteristics of BOW and normal water (Z. Li et al., Citation2015). Based on these differences, various identification methods for abnormal water have been proposed and analyzed. Based on the characteristics of Landsat7 with ETM in the blue, green, and red bands with low reflectivity, the absolute threshold conditions for the extraction of black water have been proposed (X. Li et al., Citation2012). MODIS and SeaWiFS images were used to extract the black water from the Florida Reefs in 2002 and 2012 based on 443 nm ionizing radiation and the CDOM absorption coefficient (J. Zhao et al., Citation2013). By collecting water samples from Nanjing City and analyzing the water quality parameters and spectral characteristics of BOW in high-resolution remote sensing images, methods for extracting BOW were developed, such as the band threshold method, interpolation, ratio method, and colorimetric method (Wen et al., Citation2018). The aforementioned extraction models rely on the analysis of the spectral and image characteristics of the BOW. Moreover, the subsequent empirical, semi-empirical, and analysis models are only proposed based on spectral feature design, such as the chromaticity method (Duan et al., Citation2014), threshold method (Y. Yao et al., Citation2019), or the index method (Jiang et al., Citation2019), which artificially extracts the low-level features of BOW and ignores the surrounding environment, resulting in a weak generalization ability. The performance of the algorithm requires improvement. However, this type of method requires scholars with certain professional knowledge to design the extraction algorithm, which is easily influenced by subjective selection and is difficult to generalize.

Deep learning can imitate the deep structure of neurons in the human brain (Liu et al., Citation2016; Ma et al., Citation2021). Through the training of massive data, the target features are automatically captured, and the trained model is used to automatically conduct analyses and make judgments. Deep learning has achieved ideal results in water environment application and received extensive attention, such as heavy metals (Yaseen, Citation2021), prediction of water quality (Tung et al., Citation2020, Citation2021), detection of pollutants in sewer networks (Jiang et al., Citation2021, Citation2022) and inversion of water depth (Zheng et al., Citation2017, Citation2021). Image semantic segmentation based on deep learning aims to classify each pixel in an image into a corresponding category (Csurka & Perronnin, Citation2011). Convolutional neural networks (CNNs) have attracted considerable attention in the field of computer vision (He et al., Citation2015). They provide a new perspective for extracting target features from satellite images. Grangier et al (Grangier et al., Citation2009). used a CNN to perform semantic segmentation tasks in 2009. Later, Farabet et al (Farabet et al., Citation2013). conducted related experiments. Earlier semantic segmentation methods directly pass image blocks through the CNN, and each pixel is usually labeled with the category of the object or region containing it, which is not ideal in terms of speed and accuracy. Long et al (Long et al., Citation2015). trained an end-to-end and point-to-point Fully Convolutional Network (FCN) that stacks multiple convolutional layers and pool layers to gradually expand the network receptive field, which benefits the extraction of target features (Shouno et al., Citation2015). Based on FCN, U-net (Ronneberger et al., Citation2015) and SegNet (Badrinarayanan et al., Citation2017) have also been proposed. U-net adopts the encode-decode structure and concatenates the shallow features and deep features. Specifically, every step in the decode path consists of an upsampling of the feature map, a concatenation with the correspondingly cropped feature map from the encoding path, and convolutions. High-level features are used to obtain the approximate location of the target, and low-level features contain the spatial structure details required for refining the boundary (Z. Chen et al., Citation2020; Y. Zhao et al., Citation2022). Segnet expands the receptive field and applies the results of the pooling layer to the decoding process, avoiding the cost of up-sampling operations, reducing the number of parameters, and introducing additional coding information. Despite the many differences in the improved FCN-based models, all the labels are independently predicted using these methods. Therefore, various post-processing methods to enhance spatial continuity have been explored (Koller & Friedman, Citation2009; Lin et al., Citation2016; Chen et al., Citation2014). However, most of the aforementioned methods are composed of two parts: the semantic segmentation model and conditional random field, which lose the advantage of end-to-end training. Although higher-order potential parameters can be obtained through learning, this number is limited (Kohli et al., Citation2009; Luc et al., Citation2016). Generative Adversarial Networks (GANs) are a type of neural network introduced in 2014 by Goodfellow et al. that have revolutionized the field of generative modeling (Goodfellow et al., Citation2014). The GANs consist of a generator and a discriminator. The generator learns to generate realistic data that can be mistaken for real data, while the discriminator tries to distinguish between real and generated data. This approach of adversarial training allows the generator to improve and generate increasingly realistic data. In 2015, Radford et al. proposed an improvement to the GAN architecture called Deep Convolutional GAN (DCGAN), which used deep convolutional neural networks to generate realistic images (Radford et al., Citation2015). Additionally, in the same year, Denton et al. proposed the Laplacian pyramid method to enhance the quality and detail of the generated images (Denton et al., Citation2015). In 2016, Luc et al. introduced the application of GANs in semantic segmentation to improve the accuracy of pixel classification. At present, GANs have been extensively employed in numerous research fields, such as data augmentation and semantic segmentation of high-resolution remote sensing imagery. GAN also provides a new idea for the extraction of BOW.

In summary, traditional BOW extraction is time-consuming, labor-intensive, and inefficient. Although the development of deep learning provides a new method for the extraction of BOW, the precision of BOW extraction based on deep learning is limited by an insufficient amount of labeled data and a small proportion of objects. To solve these problems, this paper proposes a fully convolutional generative adversarial network (FANet) for BOW extraction. FANet consists of an FCN as the backbone encoder and adversarial learning, which is trained by optimizing a mixed loss. To improve the extraction accuracy of BOW, we made the following attempts to adjust the structure of the FCN, enhance the information, and optimize the loss function. Finally, we captured deeper features and realized end-to-end pixel-level image segmentation, which provided an automated and precise solution for BOW extraction.

Materials

Study area

In this study, the urban built-up area of Taiyuan City, Shanxi Province, was selected as the study area. Taiyuan City is located in the middle of the Yellow River Basin in North China. The second-largest tributary of the Yellow River, the Fen River, runs through the city from north to south, with many tributaries evenly distributed in Taiyuan City. Taiyuan City has a continental climate in the northern temperate zone with four distinct seasons: abundant sunshine, concentrated rainfall in summer and autumn, and drought in winter and spring. Annual precipitation is relatively low (approximately 450 mm). These create suitable climatic conditions for BOW. Furthermore, the rapid development of heavy industry in Taiyuan, especially the mining industry, has caused serious river water pollution., especially mining Taiyuan is densely populated and the discharge of domestic wastewater has increased the severity of the BOW problem in cities.

In this study, to avoid too many background labels in the dataset, we further narrowed the scope and the selected area where the BOW was concentrated. The geographical coordinates considered were 112°23′56″E–112°37′28″ E, 37°45′32″N–38°59′20″N ().

Figure 1. The distribution of water in the study area is from the GF-2 and the true color composite of the red, green, and blue bands. The two dataframes on the right are the enlarged display of the part of the left map.

Figure 1. The distribution of water in the study area is from the GF-2 and the true color composite of the red, green, and blue bands. The two dataframes on the right are the enlarged display of the part of the left map.

Datasets

A large amount of auxiliary training based on ground verification data is required in supervised deep learning semantic segmentation experiments. This process is time-consuming, laborious, and inefficient when relying only on traditional biochemical detection methods to obtain ground verification data. As the relevant departments increase their attention to the treatment of urban BOW, localities have conducted BOW inspections, compiled BOW treatment accounts, and released them to the public (https://www.ipe.org.cn/MapWater/water_2.html?q=2). This information source is the most important for the precise spatial distribution of the BOW on the ground. Based on field investigations, image analysis, relevant news reports, and a large amount of social media information collected through web crawlers, 15 black and odorous rivers and ponds were finally confirmed as research objects ().

Table 1. List of BOW in the study area.

The width of the black and odorous rivers in the city is relatively narrow, usually less than 2 m. Finally, we selected the Gaofen-2 (GF-2) images with high resolution as the research data. The GF-2 satellite is China’s first civilian optical remote sensing satellite with a spatial resolution better than 1 m; it was launched on 19 August . The satellite is equipped with two 1 m resolution panchromatic and 4 m multispectral cameras. The characteristics of meter-level spatial resolution, high positioning accuracy, and fast attitude maneuverability have effectively improved the comprehensive observation efficiency of satellites, reaching an advanced international level. Considering the satellite transit time, sunshine conditions, imaging quality, cloud and snow cover (January, February, March, December), and flood seasons of rivers and lakes (July, August, September), we selected the three images taken on 6 May 2017. The basic information of these images is provided in .

Table 2. List of GF-2 images.

Combined with the BOW treatment ledger compiled by the relevant national departments, all the images in the dataset were manually identified and marked, and the remote sensing image was segmented using a sliding window method. Each image was 224 × 224 pixels in size. Based on the aforementioned, a BOWD was built that contains manually marked pixel-level remote sensing images and original GF-2 images; the whole process is shown in .

Figure 2. Data preprocessing and BOWD construction.

Figure 2. Data preprocessing and BOWD construction.

Data augmentation generally improves performance, making it an effective and simple technology for deep learning network training (F. Hu et al., Citation2015). It can also be used as a substitute when resources are constrained and can achieve similar or better performance than regularized models while being more adaptable to changes in architecture and training data. Therefore, this study used horizontal flip; vertical flip; and counterclockwise 90°, 180°, and 270° rotation to expand the training dataset. shows the effects of the original image and its corresponding labels after expansion. After data enhancement, 5100 sets of data were obtained. Due to the small number of images in this dataset, referring to the literature (Prahs et al., Citation2018; Serte & Demirel, Citation2021), the training set and test set were randomly generated at a ratio of 9:1. Finally, we split these 5100 sets into 4590 sets of images for the training dataset and 510 sets for the validation dataset. The training dataset is the dataset used to optimize the parameters of the deep learning model. By training on a large and diverse dataset, the model can learn the underlying patterns and relationships between the input and output. The validation dataset is used to monitor the performance of the model during the training process. The model is evaluated on the validation dataset periodically to ensure that it is not overfitting to the training data.

Figure 3. Dataset expansion. (a) Original image; (b) Horizontal mirror; (c) Vertical mirror; (d) 90°Counterclockwise rotation; (e) 180°Counterclockwise rotation; (f) 270°Counterclockwise rotation;(I) Normal water expansion; (II) BOW expansion.

Figure 3. Dataset expansion. (a) Original image; (b) Horizontal mirror; (c) Vertical mirror; (d) 90°Counterclockwise rotation; (e) 180°Counterclockwise rotation; (f) 270°Counterclockwise rotation;(I) Normal water expansion; (II) BOW expansion.

Methods

In this paper, FANet is proposed to compensate for the impact of insufficient data labels and small proportion of objects on the segmentation results and provide an automated, high precise solution for BOW extraction. We used the FCN with the information augmentation and the loss function optimization as a generator to capture deeper features. Meanwhile, we added a discriminator to reduce misclassification and realize end-to-end pixel-level image segmentation. The corresponding framework is illustrated in .

Figure 4. Flow diagram of extracting the BOW mainly including remote sensing data processing, BOWD construction and FANet proposal.

Figure 4. Flow diagram of extracting the BOW mainly including remote sensing data processing, BOWD construction and FANet proposal.

Overview of the proposed network

The existing model usually predicts the category of each pixel in image segmentation, and the accuracy of the pixel level may be very high. However, it is easy to ignore the relationship between pixels, which makes the segmentation result not sufficiently continuous or obvious, making the size and shape of an object different from those in the real label. The confrontation network structure can be used to detect and correct high-order inconsistencies between real and segmented labels (Luc et al., Citation2016). Therefore, this paper proposes a fully convolutional generative adversarial network (FANet), regarded as a generator in a GAN. In other words, there is no need to construct another generator by simply adding a discriminator.

The main framework of FANet is shown in . In this study, we used the optimized FCN model as the generator and built a shallow discriminator to distinguish between the real and predicted images generated by the optimized FCN network. At the same time, this experiment used a multiclass cross-entropy loss and a weighted loss function based on the discriminator as the loss function to optimize the model. Without a complicated loss function, it was sufficient to judge whether the generated images were false and the real labels were true.

Figure 5. The architecture of FANet.

Figure 5. The architecture of FANet.

Input of module

When a certain pollutant in the water increases, the density of water changes, and the water shows different colors, temperatures, and transparency. Moreover, the ability of water to absorb and reflect solar radiation energy changes, causing the reflectivity to change. Finally, the remote sensing images show different textures, colors, and channel reflection values (Y. Wu et al., Citation2021). Therefore, based on the remote sensing characteristics of types of water, in this study, we explored band and index combinations to distinguish the water types, establishing a qualitative or quantitative target water extraction model. The frequencies of BOW and the normal water for all pixel values in the regions of interest were calculated (). We found that the pixels of BOW were mostly concentrated between 1500 and 4000 in Band4 (near-infrared band), and those of normal water were rarely distributed, which implied that the near-infrared band could help distinguish between normal water and BOW.

Figure 6. Pixel frequency statistics of waters in various bands (the display effect is enhanced: the remote sensing reflectance is magnified by 1000 times).

Figure 6. Pixel frequency statistics of waters in various bands (the display effect is enhanced: the remote sensing reflectance is magnified by 1000 times).

To obtain the spectral reflectance characteristics of different waters, we selected 30 typical sample points in the BOW and normal water samples, and the average spectral reflectance of each band was calculated (). We observed that the reflectivity of Band1 and Band2 of the BOW was lower than that of the normal water. The difference between the BOW in Band2 and Band3 was smaller than that in normal water, and the spectral slope was small. Based on the aforementioned spectral differences, various BOW extraction indices could be constructed.

Figure 7. Average reflectance rate of different types of water in GF-2 images.

Figure 7. Average reflectance rate of different types of water in GF-2 images.

Based on the characteristics of the BOW spectrum, to highlight the difference between BOW and normal water by leveraging subtraction and division, the Normalized Difference Black and Odorous Water Index (NDBWI) was constructed (Wen et al., Citation2018), as shown in EquationEquation (1).

(1) NDBWI=RGRRRG+RR(1)

Therefore, the Black and Odorous Water Index (BOI) was proposed based on the spectral characteristics of BOW because BOW is less obvious than normal water in the green to the red band (Y. Yao et al., Citation2019), as shown in EquationEquation (2).

(2) BOI=RGRRRB+RG+RR(2)

In the field of semantic segmentation, scholars usually use image-processing methods to enhance certain information and improve performance (Tian et al., Citation2020). According to the characteristics of the small spectral slope of BOW between the red and green bands, the BOI index can effectively separate BOW from normal water. Because normal water is closely related to the surrounding vegetation coverage, the NDBWI can capture this feature. At the same time, to verify the feasibility of separating normal water from BOW in the near-infrared band, multispectral high-resolution images were converted into RGB ordinary images, and the differences in the segmentation effects of the two methods were compared.

Based on the aforementioned comparison, we selected different band or index combinations to enhance the BOW extraction. The five band or index combinations used as the input of the adjusted FCN were as follows: (C1) RGB, (C2) RGB + NIR, (C3) NIR + NDBWI + BOI, (C4) RGB + NDBWI, and (C5) NDVI + NDBWI + BOI.

Generator

The FCN structure was adjusted in this study, and its adjusted structure is shown in . First, the features were extracted based on the first 15 layers of VGG19 (Simonyan & Zisserman, Citation2015). Next, the last three fully connected layers in VGG19 were replaced by convolution kernels of different sizes. To enable the network to capture larger spatial features while capturing details, we ensured that the FCN first-layer convolution kernel was no longer 3 × 3 in size; it was replaced with a 7 × 7 convolution kernel. The two remaining layers of the convolution kernels were both 1 × 1 in size. To prevent the model from overfitting, we added a dropout (Srivastava et al., Citation2014) layer after the first two convolutional layers. After repeated training, we observed that the neuron worked best when the deactivation rate was 25%. After 18 layers of convolutional feature extraction, a high-level semantic feature with a size of 7 × 7 was obtained from the input image of 224 × 224. Finally, after three deconvolution operations, the image was restored to the original image size.

Figure 8. Adjusted FCN with larger receptive field.

Figure 8. Adjusted FCN with larger receptive field.

Loss function

Weighted cross-entropy loss based on class balance

There may be multiple targets in the image-segmentation tasks. However, when the proportion of target pixels is too small for all pixels, the problem of imbalanced categories is highlighted, increasing the difficulty of network training and significantly reducing the segmentation effect of the small target objects that were significantly reduced (Tan et al., Citation2021). BOW has a low proportion of pixels because of its narrow spatial characteristics. The number of background pixels (6979490) and the number of BOW pixels (332440) differed (approximately 21 times). At the same time, the number of normal water pixels (946075) is three times that of BOW.

For common two-category semantic segmentation, the general loss function is given by EquationEquation (3).

(3) Lbcex,y=1Ni=1Nyilogfixi,yi\break+1yilog(1fixi,yi(3)

where N denotes the number of samples participating in the classification, yi is the true category of input xi, and fix,y is the probability that x belongs to category y. The target loss function is closer to 0, indicating that the classification result is closer to the real situation. The loss function pays the same attention to each category, and this is unfair to the smaller segmentation category, making the segmentation result susceptible to category imbalance.

Therefore, in this study, we introduced the weight coefficient to increase the attention of the network to small targets, and the weighted cross-entropy loss function is shown in Eq. (4). Through comparative experiments, we found that the segmentation result was the best for BOW when the weights of the background value, normal water, and BOW were set to 1:4:8.

(4) Lwbcex,y,v=1Ni=1Nviyilogfixi,yi\break+1yilog(1fixi,yi)(4)

where vi is the weight of category i, and its formula is given by EquationEquation (5).

(5) vi=Nnfixi,yinfixi,yi(5)

Perceptual loss based on spatial structure

Semantic segmentation is regarded as an image-conversion task. The difference between the converted image and ground truth is often measured using a pixel-by-pixel loss function. However, this method cannot capture the spatial difference between the output image and the ground truth (Mosinska et al., Citation2018) when a few pixels are misplaced between the two images. Although the visual perception is similar, the interpolation between the corresponding pixels will differ because of statistical subjectivity. Therefore, the perceptual loss function can be used to assist in optimizing the parameters in image conversion, and generating high-quality images.

Thus, in this study, we introduced regularization to improve performance. The regularization was based on the feature map generated in the VGG-19 shallow network because the features extracted by the shallow layer were the spatial texture features of the image, which could maintain the spatial information of the image. The perception loss function is given by EquationEquation (6).

(6) Ltopx,y,w=n=1N1MnWnHnm=1Mnlnmylnmfx,w22(6)

where fx,w is the segmentation result when image x is optimized by w parameters, lnm is the m-th feature map of the n-th layer in the training network, Mn is the number of channels in the n layers, and the size of each channel is Wn×Hn.

The final loss function Ltotal is composed of cross-entropy loss and perceptual loss, as shown in EquationEquation (7).

(7) Ltotalx,y,w,v=Lwbcex,y,w,v+μLtopx,y,w,(7)

Hybrid loss function

The loss function is defined as shown in EquationEquation (8).

(8) θS,θA=n=1NSmceSxn,ynλAbceAxn,yn,1\break+AbceAxn,Sxn,0(8)

where θS and θA represent the parameters that need to be trained in the segmentation network and adversarial network, respectively. Ax,y0,1 denotes the scalar probability with which the adversarial network predicts that y is the ground truth label map of x, as opposed to being a label map produced by the segmentation model. S. lSmacx,y=(ylnsigmoid\breakx+1yln1sigmoidx) is the multiclass cross-entropy loss for predictions x, which equals the negative log-likelihood of the target segmentation map y represented using a 1-hot encoding. Similarly, lAbcex,y=ylnx+1yln1x is the binary cross-entropy loss. The ultimate goal was to make the segmentation label generated by the segmentation network as close to the real label as possible, to minimize the loss of the segmentation network S to update the parameters θS. At the same time, the adversarial network is as accurate as possible to distinguish the label map produced by the segmentation models and the ground truth label map. In other words, maximize the possibility of Axn,Sxn=1 and Axn,yn=0.

Results

Evaluation metrics

In this study, we mainly used the intersection over union (IoU) and F1-Score as the model performance evaluation criteria. There were K+1 categories in the image to be segmented.

For pixel-based BOW extraction, the number of true positive (TP) pixels formed the intersecting area; the sum of TP, false positive (FP), and false negative (FN) pixels formed the union. The pixel-wise IoU score is provided by EquationEquation (9).

(9) IoU=TPFN+FP+TP(9)

The IoU has certain limitations for multi-category segmentation tasks when the segmentation effect for individual targets is very poor. This confuses the statistical results. Therefore, to a certain extent, IoU cannot objectively and truly reflect the effects of the model. In this study, the F1-Score calculated using EquationEquation (10) was primarily used to compare the performance of the extraction networks. It is the weighted average of the precision given in EquationEquation (11), and the recall given in EquationEquation (12).

(10) F1score=2×precision×recallprecision+recall(10)
(11) precision=TPTP+FP(11)
(12) recall=TPTP+FN(12)

Experiment

Comparison of U-net, SegNet, FCN, and adjusted FCN as baseline

In the experiment, we used the BOWD RGB bands as the input and conducted BOW extraction training on the adjusted FCN, FCN, U-net, and SegNet for comparison. Limited by the experimental environment, the experiment used batch training with eight batch sizes. All training datasets recorded the loss and accuracy in each epoch and recorded the overall accuracy and loss of all validation sets. The results are shown in . By contrast, the segmentation effect of the BOW using the adjusted FCN is significantly better than that of the FCN, U-net, and SegNet in terms of IoU and F1-score, and our model achieved the best performance with an IoU of 0.7549 and an F1-score of 0.8215 in extracting BOW. Therefore, the adjusted FCN was used as the backbone network for further optimization in this study.

Table 3. Result in the validation set based on U-net, SegNet, FCN, and adjusted FCN.

Comparison of the band or index combinations as input

shows the overall accuracy and loss changes in BOW extraction in the training set for the five combinations: (C1) RGB, (C2) RGB + NIR, (C3) NIR + NDBWI + BOI, (C4) RGB + NDBWI, and (C5) NDVI + NDBWI + BOI. In general, the loss of five combinations tended to be stable and optimal after approximately 32 epochs of training, and most of their F1-score exceeded 0.85. Notably, the final loss values of the five combinations on the training set were not considerably different, but the overall accuracy of C2 and C5 was the best. C3’s overall accuracy was the worst of all the combinations.

Figure 9. Band or index combinations comparison of accuracy and loss in the training set (Accuracy is the ratio of the number of correctly classified pixels to the total number of pixels).

Figure 9. Band or index combinations comparison of accuracy and loss in the training set (Accuracy is the ratio of the number of correctly classified pixels to the total number of pixels).

To compare the influence of each combination as the input of the adjusted FCN model, we calculated the IoU and F1-score of the corresponding validation set ().

Table 4. Band or index combinations comparison of IoU and F1-Score in the validation set.

shows that the aforementioned combinations exhibited good segmentation effects for normal water, and the difference was not obvious. However, the segmentation performance of the five combinations was very different for BOW, among which C4 was the best, with IoU = 0.8023, followed by C5. C3 had the worst segmentation effect for BOW. This further demonstrates that the NIR can serve as a crucial auxiliary component in the BOW extraction.

shows the segmentation effects of band or index combinations. According to the observations illustrated in , it is apparent that the model’s performance is most superior when C5 is utilized as input, with C4 ranking a close second. While C1–5’s performance is comparably consistent for larger targets (columns 4 and 5), it exhibits significant omissions and misclassifications for smaller targets (columns 1 and 2), specifically when using C1–3 as input.

Figure 10. BOW extraction from band or index combinations as input comparison in the validation set.

Figure 10. BOW extraction from band or index combinations as input comparison in the validation set.

Comparison of loss functions

In this study, the weighted cross-entropy loss based on class balance and perception loss based on the spatial structure were investigated. Because the model structure did not change, the optimized FCN model based on the C4 input was directly used in this part. We used only different loss functions to effectively improve the training efficiency.

shows IoU and F1-Score based on loss functions in the validation set. The performance of the two proposed methods was improved based on the prior experiment based on cross-entropy loss. The segmentation of normal water reached the bottleneck stage, and its IoU and F1-score were difficult to improve again. The IoU of the perceptual loss function was 0.8259, which was an increase of approximately 2.94% as compared with the cross-entropy loss and an increase of 2.13% as compared with the weighted cross-entropy loss; F1-Score increased by 3.74% overall as compared with the cross-entropy loss. The results of the proposed two loss functions improved in the BOW extraction.

Table 5. Loss function optimization comparison.

shows a comparison of the segmentation results after loss function optimization. Using perception loss was the best for the segmentation effect: the image spots were more complete, the segmentation results were more accurate, and the boundary was closer to that of the real label. Therefore, the segmentation effect of the BOW gradually improved with the continuous superposition of the optimization functions.

Figure 11. BOW extraction from loss function comparison in the validation set.

Figure 11. BOW extraction from loss function comparison in the validation set.

Experiment of FANet

In this part, we developed FANet based on adversarial learning, which was conditioned on the inference of the prior experiment. shows the comparison results of Optimized FCN and our proposed FANet on the validation set. FANet extracted BOW with an IoU of 0.8564, an increase of approximately 0.031 compared to the optimized FCN, and with the F1-Score of 0.9031. In general, FANet can improve the performance of target object extraction to a certain extent. Furthmore, FANet provided a new idea for semantic segmentation training with insufficient data labels.

Table 6. IoU and F1-Score in validation sets based on FANet and optimized FCN comparison.

shows a comparison between FANet and the optimized FCN. By comparing the predicted maps with real labels, we found that the FANet segmentation effect was better than optimized FCN, which eliminated some of the broken patches of the FCN segmentation and solved the problem of spatial discontinuity to a certain extent (columns 4).

Figure 12. Fanet and optimized FCN segmentation effect comparison in the validation set.

Figure 12. Fanet and optimized FCN segmentation effect comparison in the validation set.

Discussion

Bowd

Based on the three remote sensing images of GF-2 and the BOW management accounts compiled by the relevant state departments, a BOWD was constructed, which filled the gap in the BOW dataset. However, the spatial distribution information of BOW in Taiyuan obtained from multiple channels remains relatively rough. The BOW obtained in this study is only accurate to the level of the river, but not all reaches of the same river have black and odorous phenomena. This problem also exists in many BOW studies (S. Wu, Citation2019). Other methods (e.g. field investigation) are further refined to the river reach level, improving the quality of BOWD. In addition, the number of images in BOWD is insufficient, which affects the performance of the deep learning framework.

Feature representations for BOW

Traditional remote sensing image classification methods are mainly based on spectral information, which cannot accurately classify objects. Deep learning can extract spatial structures, such as texture, context, and orientation. Convolution operations at different depths can extract features at different image levels. The features learned by shallow convolution are simple and detailed, such as the edges, corners, texture, and geometric shapes, and the features learned by deep convolution are more complex and abstract.

In this study, operations such as information enhancement and loss function optimization were added to the adjusted FCN model. With the optimization of the FCN, the semantic information of BOW became increasingly abstract. Upon the introduction of adversarial learning, the extraction accuracy of BOW was further improved. Overall, these results demonstrate that the proposed FANet has strong robustness and reasoning abilities.

Ablation study

In this section, we discuss the ablation study to verify the effectiveness of each key component of the proposed model. Ablation analysis was performed on BOWD. As shown in , the adjusted FCN was adopted as the baseline model, and each module was then added progressively. The BOW IoU was improved by 6.28% compared with the basic model when RGB + NDBWI was used as the baseline input. This result demonstrates that the combinations of RGB and the NDBWI provide more information on BOW. The results improved by 2.94% after the perceptual loss replaced the cross-entropy loss at baseline. The addition of the discriminator improved the baseline from 0.8259 to 0.8564 in terms of the BOW. This result implies that the discriminator improved the occurrence of incorrect recognition and connectivity problems. Finally, FANet achieved the best performance, demonstrating the necessity of each component for the proposed model to obtain the best BOW extraction results.

Table 7. Ablation study with component combinations in BOWD.

Conclusion

In this study, we created BOWD and proposed FANet to realize the extraction of BOW. It proved that the combination of RGB and the NDBWI as the input of the adjusted FCN could effectively improve the model’s ability to extract BOW. Next, we compared the cross-entropy loss function and the perceptual loss and chose the perception loss as the generator’s loss function because of its strong ability to effectively maintain the original spatial structure of the target object and improve the segmentation performance. Finally, we conducted experiments on BOWD, and the proposed framework was more automated and accurate for BOW extraction than the traditional method. This study represents a new attempt to extract BOW under the guidance of adversarial learning. In our future work, we plan to study a more general deep learning framework to apply data from different sources. In addition to network learning, we plan to mark the level of BOW in sections and add data sources to further improve BOWD.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The BOW inspections conducted by localities can be acquired via https://www.ipe.org.cn/MapWater/water_2.html?q=2.

Additional information

Funding

This research is financially supported by the National Natural Science Foundation of China (Grant Nos. 42130309 and 41972066).

References

  • Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence, 39(12), 2481–16. https://doi.org/10.1109/TPAMI.2016.2644615
  • Cao, J., Sun, Q., Zhao, D., Xu, M., Shen, Q., Wang, D., Wang, Y., & Ding, S. (2020). A critical review of the appearance of black-odorous waterbodies in China and treatment methods. Journal of Hazardous Materials, 385, 121511. https://doi.org/10.1016/j.jhazmat.2019.121511
  • Cheng, J., Wu, E., Che, Y., & Xu, Q. (2006). Study on key indicators for judging black and odorous water in area of plain river system. China Water Wastewater, 22, 18–22. https://doi.org/10.3321/j.issn:1000-4602.2006.09.005
  • Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. 2015 3rd International Conference on Learning Representations (ICLR), San Diego, USA. https://doi.org/10.48550/arXiv.1412.7062
  • Chen, Z., Xu, Q., Cong, R., & Huang, Q. (2020). Global context-aware progressive aggregation network for salient object detection. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7), 10599–10606. https://doi.org/10.1609/aaai.v34i07.6633
  • Chen, S., Zhao, W., & Liao, Z. (2021). Remote sensing identification of black-odor water bodies: A review. Remote Sensing for Natural Resources, 33, 20–29. https://doi.org/10.6046/gtzyyg.2020104
  • Csurka, G., & Perronnin, F. (2011). An efficient approach to semantic segmentation. International Journal of Computer Vision, 95(2), 198–212. https://doi.org/10.1007/s11263-010-0344-8
  • Denton, E., Chintala, S., Szlam, A., & Fergus, R. (2015). Deep generative image models using a Laplacian pyramid of adversarial networks. Advances in Neural Information Processing Systems, 28, 1486–1494. https://doi.org/10.48550/arXiv.1506.05751
  • Duan, H., Ma, R., Loiselle, S. A., Shen, Q., Yin, H., & Zhang, Y. (2014). Optical characterization of black water blooms in eutrophic waters. Science of the Total Environment, 482, 174–183. https://doi.org/10.1016/j.scitotenv.2014.02.113
  • Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1915–1929. https://doi.org/10.1109/TPAMI.2012.231
  • Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27, 2672–2680. https://doi.org/10.5555/2969033.2969125
  • Grangier, D., Bottou, L., & Collobert, R. (2009). Deep convolutional networks for scene parsing. 2009 26th International Conference on International Conference on Machine Learning (ICML), Montreal, Canada: ACM.
  • He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. 2015International Conference on Computer Vision (ICCV), Santiago, Chile:IEEE
  • Huang, Z., & Zheng, J. (2019). Extraction of black and odorous water based on aerial hyperspectral CASI image. 2019 39th International Geoscience and Remote Sensing Symposium (IGARSS) Yokohama, Japan:IEEE
  • Hu, C., Hackett, K. E., Callahan, M. K., Andréfouët, S., Wheaton, J. L., Porter, J. W., & Muller Karger, F. E. (2003). The 2002 ocean color anomaly in the Florida Bight: A cause of local coral reef decline? Geophysical Research Letters, 30(3). https://doi.org/10.1029/2002GL016479
  • Hu, C., Muller Karger, F. E., Vargo, G. A., Neely, M. B., & Johns, E. (2004). Linkages between coastal runoff and the Florida keys ecosystem: A study of a dark plume event. Geophysical Research Letters, 31(15). https://doi.org/10.1029/2004GL020382
  • Hu, F., Xia, G., Hu, J., & Zhang, L. (2015). Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sensing, 7(11), 14680–14707. https://doi.org/10.3390/rs71114680
  • Jiang, Y., Li, C., Song, H., & Wang, W. (2022). Deep learning model based on urban multi-source data for predicting heavy metals (Cu, Zn, Ni, Cr) in industrial sewer networks. Journal of Hazardous Materials, 432, 128732. https://doi.org/10.1016/j.jhazmat.2022.128732
  • Jiang, Y., Li, C., Zhang, Y., Zhao, R., Yan, K., & Wang, W. (2021). Data-driven method based on deep learning algorithm for detecting fat, oil, and grease (FOG) of sewer networks in urban commercial areas. Water Research, 207, 117797. https://doi.org/10.1016/j.watres.2021.117797
  • Jiang, Y., Zhou, N., Zhou, Y., Huang, R., & Zhong, Z. (2019). Research on remote sensing monitoring of urban black and odorous water. Bulletin of Surveying & Mapping, 98–104. https://doi.org/10.13474/j.cnki.11-2246.2019.0522
  • Kohli, P., Ladicky, L., & Torr, P. H. S. (2009). Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision, 82(3), 302–324. https://doi.org/10.1007/s11263-008-0202-0
  • Koller, D., & Friedman, N. (2009). Undirected graphical models. In D. Koller (Ed.), Probabilistic graphical models: Principles and techniques (pp. 101–105). CRC Press.
  • Li, Z., Duan, H., Shen, Q., Zhang, Y., & Ma, R. (2015). The changes of water color induced by chromophoric dissolved organic matter (CDOM) during the formation of black blooms. Journal of Lake Science, 27(4), 616–622. https://doi.org/10.18307/2015.0408
  • Li, X., Niu, Z., Jiang, S., & Jin, Y. (2012). Satellite remote sensing monitoring of black color water blooms in lake Taihu. The Administration and Technique of Environmental Monitoring, 24(2), 12–17. https://doi.org/10.3969/j.issn.1006-2009.2012.02.003
  • Lin, G., Shen, C., Van Den Hengel, A., & Reid, I. (2016). Efficient piecewise training of deep structured models for semantic segmentation. 2016 30th Computer Vision and Pattern Recognition (CVPR), Nevada, USA, IEEE.
  • Liu, D., Li, S., & Cao, Z. (2016). State-of-the-art on deep learning and its application in image object classification and detection. Computer Science, 43(12), 13–23. https://doi.org/10.11896/j.issn.1002-137X.2016.12.003
  • Long, J., Shelhamer, E., & Darrell, T. 2015 Fully convolutional networks for semantic segmentation 2015 29th Computer Vision and Pattern Recognition (CVPR) Boston, IEEE
  • Luc, P., Couprie, C., Chintala, S., & Verbeek, J. (2016). Semantic segmentation using adversarial networks. Advances in Neural Information Processing Systems, 1–12. https://doi.org/10.48550/arXiv.1611.08408
  • Ma, X., Wang, L., Qi, K., & Zheng, G. (2021). Remote sensing image scene classification method based on multi-scale cyclic attention network. Earth Science, 46(10), 3740–3752. https://doi.org/10.3799/dqkx.2020.365
  • Mosinska, A., Marquez-Neila, P., Koziński, M., & Fua, P. (2018). Beyond the pixel-wise loss for topology-aware delineation. 2018 32th Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah: IEEE.
  • Nichol, J. E. (1993a). Remote sensing of tropical blackwater rivers: A method for environmental water quality analysis. Applied Geography, 13(2), 153–168. https://doi.org/10.1016/0143-6228(93)90056-7
  • Nichol, J. E. (1993b). Remote-sensing of water-quality in the Singapore-Johor-Riau growth triangle. Remote Sensing of Environment, 43(2), 139–148. https://doi.org/10.1016/0034-4257(93)90003-G
  • Prahs, P., Radeck, V., Mayer, C., Cvetkov, Y., Cvetkova, N., Helbig, H., & Märker, D. (2018). OCT-based deep learning algorithm for the evaluation of treatment indication with anti-vascular endothelial growth factor medications. Graefe’s Archive for Clinical and Experimental Ophthalmology, 256(1), 91–98. https://doi.org/10.1007/s00417-017-3839-y
  • Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. Advances in Neural Information Processing Systems. https://doi.org/10.48550/arXiv.1511.06434
  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. 2015 18th International Conference on Medical image computing and computer-assisted intervention (MICCAI), Munich, Germany: Springer.
  • Serte, S., & Demirel, H. (2021). Deep learning for diagnosis of COVID-19 using 3D CT scans. Computers in Biology and Medicine, 132, 104306. https://doi.org/10.1016/j.compbiomed.2021.104306
  • Shao, H., Ding, F., Yang, J., & Zheng, Z. (2021). Remote sensing information extraction of black and odorous water based on deep learning. Journal of Yangtze River Science Research Institute, 1–10. https://doi.org/10.11988/ckyyb.20210045
  • Shouno, H., Suzuki, S., & Kido, S. (2015). A transfer learning method with deep convolutional neural network for diffuse lung disease classification. 2015 22nd International Conference on Neural Information Processing (ICONIP), Istanbul, Turkey: Springer.
  • Simonyan, K., & Zisserman, A. (2015) Very deep convolutional networks for large-scale image recognition. 2015 4th International Conference of Legal Regulators(ICLR)Toronto, Canada: OpenReview.net.
  • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958. https://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf?utm_content=buffer79b43&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
  • Summers, J. (1986). Water quality modeling of the Huangpu River in Shanghai, China. 1986 National Environmental Engineering Conference: Use and Abuse of Environmental Information in Engineering, Melbourne, Victoria: InformIT.
  • Tan, L., Lv, X., Lian, X., & Wang, G. (2021). Yolov4_drone: UAV image target detection based on an improved YOLOv4 algorithm. Computers & Electrical Engineering, 93, 107261. https://doi.org/10.1016/j.compeleceng.2021.107261
  • Tian, C., Xu, Y., Li, Z., Zuo, W., Fei, L., & Liu, H. (2020). Attention-guided CNN for image denoising. Neural Networks, 124, 117–129. https://doi.org/10.1016/j.neunet.2019.12.024
  • Tung, T. M., Yaseen, Z. M., & Yaseen, Z. M. (2020). A survey on river water quality modelling using artificial intelligence models: 2000–2020. Journal of Hydrology, 585, 124670. https://doi.org/10.1016/j.jhydrol.2020.124670
  • Tung, T. M., Yaseen, Z. M., & Yaseen, Z. M. (2021). Deep learning for prediction of water quality index classification: Tropical catchment environmental assessment. Natural Resources Research, 30(6), 4235–4254. https://doi.org/10.1007/s11053-021-09922-5
  • Wang, Q., Xu, J., Chen, Y., Li, J., & Wang, X. (2012). Influence of the varied spatial resolution of remote sensing images on Urban and rural residential information extraction. Resources Science, 34(1), 159–165. https://do.org/CNKI:SUN:ZRZY.0.2012-01-024
  • Wen, S., Wang, Q., Li, Y., Zhu, L., Lü, H., Lei, S., Ding, X., & Miao, S. (2018). Remote sensing identification of urban black-odor water bodies based on high-resolution images: A case study in Nanjing. Environmental Sciences, 39(1), 57–67. https://doi.org/10.13227/j.hjkx.201703264
  • Wu, S. (2019). Research progress of remote sensing monitoring key technologies for urban black and odorous water bodies. Chinese Journal of Environmental Engineering, 13(6), 1261–1271. https://doi.org/10.12030/j.cjee.201812020
  • Wu, Y., Han, P., & Zheng, Z. (2021). Instant water body variation detection via analysis on remote sensing imagery. Journal of Real-Time Image Processing, 18(5), 1577–1590. https://doi.org/10.1007/s11554-020-01062-y
  • Yao, H., Lu, Y., & Gong, Z. (2019). Remote sensing identification of urban black and odorous water body based on PlanetScope images: A case study in Qinzhou, Guangxi. Environmental Engineering, 37(10), 35–43. https://do.org/CNKI:SUN:YGXB.0.2019-02-005
  • Yao, Y., Shen, Q., Zhu, L., Gao, H., Cao, H., Han, H., Sun, J., & Li, J. (2019). Remote sensing identification of urban black-odor water bodies in Shenyang city based on GF-2 image. Journal of Remote Sensing, 23(2), 230–242. https://doi.org/10.11834/jrs.20197482
  • Yaseen, Z. M. (2021). An insight into machine learning models era in simulating soil, water bodies and adsorption heavy metals: Review, challenges and solutions. Chemosphere, 277, 130126. https://doi.org/10.1016/j.chemosphere.2021.130126
  • Yuan, P., Xu, L., Baoling, K. E., Sun, F., & Gao, H. (2020). Treatment and ecological restoration of black and odorous water body in Yueya Lake in Nanjing City. Journal of Environmental Engineering Technology, 10(5), 696–701. https://doi.org/10.12153/j.issn.1674-991X.20200111
  • Zhang, Y., Shang, J., & Yu, X. (2005). Application of principal component-cluster analysis complex model to water environment management: A case study in Songhua River in Jilin section as an example. Advances in Water Science, 16(4), 592. https://doi.org/10.14042/j.cnki.32.1309.2005.04.021
  • Zhao, J., Hu, C., Lapointe, B., Melo, N., Johns, E. M., & Smith, R. H. (2013). Satellite-observed black water events off Southwest Florida: Implications for coral reef health in the florida keys national marine sanctuary. Remote Sensing, 5(1), 415–431. https://doi.org/10.3390/rs5010415
  • Zhao, Y., Zheng, G., Xu, Z., Qiu, Z., & Chen, Z. (2022). Multiscale feature weighted-aggregating and boundary enhancement network for semantic segmentation of high-resolution remote sensing images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15, 8118–8130. https://doi.org/10.1109/JSTARS.2022.3205609
  • Zheng, G., Le, X., Wang, H., & Hua, W. (2017). Inversion of water depth from WorldView-02 satellite imagery based on BP and RBP neural network. Earth Science, 42(12), 2345–2353. https://doi.org/10.3799/dqkx.2017.552
  • Zheng, G., Pan, Z., Meng, Y., & Wang, H. (2021). Inversion of sea surface flow field in Southern South China sea based on satellite remote sensing data. Earth Science, 46(1), 341–349. https://doi.org/10.3799/dqkx.2020.250