841
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Semantic segmentation for plastic-covered greenhouses and plastic-mulched farmlands from VHR imagery

, , , , , , , & show all
Pages 4553-4572 | Received 08 Aug 2023, Accepted 22 Oct 2023, Published online: 06 Nov 2023

ABSTRACT

Due to their important role in maintaining temperature and soil moisture, agricultural plastic covers have been widely utilized around the globe for improving crop-growing conditions, which include both plastic-covered greenhouses (PCGs) and plastic-mulched farmlands (PMFs). However, it is a challenging and long-neglected issue to separate PCGs from PMFs due to their spectral similarity. The objective of this study is to propose a deep semantic segmentation model for accurate PCG and PMF mapping based on very high-resolution satellite images and to improve the model’s spatial generalization capability using a transfer learning strategy. Specifically, the proposed semantic segmentation model has an encoder-decoder structure, where the encoder is composed of a new convolutional neural network for discriminative spatial feature learning, while the decoder utilizes a multi-task strategy to improve the predictions on the boundaries. Meanwhile, a transfer learning framework is adopted to increase mapping performance and generalization ability under limited samples. Experimental results in several typical regions across the Eurasian continent show that the proposed model could separate PCGs from PMFs accurately with a mean overall accuracy of 94.49% and an average mIoU of 0.8377. Ablation studies verify the role of encoder-decoder and transfer learning strategy in improving classification performance.

1. Introduction

With the rapid development of the plastic industry, plastic films have been heavily used in agriculture worldwide, leading to the widespread of agricultural plastic covers (APCs), which mainly consist of plastic-covered greenhouses (PCGs) and plastic-mulched farmlands (PMFs) (Chang et al. Citation2013; Jiménez-Lao et al. Citation2020). For instance, the area of PCG in China in 2006 was only 810 km2 (Wu et al. Citation2016); however, it increased rapidly to 10330 km2 in 2019 (Feng, Niu, Zhu, et al. Citation2021) with a growth rate of about 732.3 km2/year. Meanwhile, the associated problem of environmental pollution could not be overlooked. On the one hand, plastic cover could change the soil’s physicochemical properties, causing soil salinization and hardening. On the other hand, plastic covers, especially plastic mulching films, could increase the microplastics in soil. Therefore, monitoring APC is of great significance for both agricultural and environmental fields. However, most previous studies only focused on either PCGs or PMFs, neglecting the separation of these two kinds of APCs. Hence, the objective of this study is to propose a semantic segmentation model for the accurate classification of both PCGs and PMFs.

As for data sources, previous studies mainly used moderate-resolution optical satellite images such as Landsat-5/7/8 and Sentinel-2 for APC mapping (Hao et al. Citation2019; Lu, Tao, and Di Citation2018). For instance, Ou et al. (Citation2019) utilized Landsat-5/7/8 imagery to classify PCGs in the Shandong Province of China based on Google Earth Engine (GEE). Novelli et al. (Citation2016) evaluated the performance of Sentinel-2 and Landsat-8 in PCG classification using an object-based method in Almeria of Spain. However, PCGs and PMFs have very similar appearances (shapes, textures, etc.) on Landsat or Sentinel images, making it difficult to separate.

To tackle the above issue, we introduce the first hypothesis. With the increase of spatial resolution, it is possible to separate PCGs from PMFs. Actually, in very high-resolution (VHR) images, PCGs and PMFs manifest different spatial appearances. One difference is that PCG has shadows generated from its stereo structure while PMF rarely witnesses any shadows. The other difference is texture, where PCG is more like buildings while PMF is more flattened and similar to water surfaces. Therefore, we would utilize VHR images for PCG and PMF separation in this study.

As for APC mapping methods, previous studies mainly utilized handcrafted features and classical machine learning methods (Ji et al. Citation2020; Yang et al. Citation2017). For instance, Feng, Niu, Zhu, et al. (Citation2021) utilized random forest to generate the first 30-m national PCG map of China using the GEE cloud platform and achieved a good overall accuracy of 89%. Hao et al. (Citation2019) proposed a new workflow to classify the PMF land cover using multi-temporal Sentinel-2 data. Meanwhile, with the success of deep learning in remote sensing, it has also been utilized for APC mapping (Li et al. Citation2022; Ma et al. Citation2021). Deep learning gets rid of the feature design process and integrates feature extraction and classification into an end-to-end deep neural network, which can learn the most representative features automatically according to the loss function. For instance, both Chen et al. (Citation2021) and Zhang et al. (Citation2021) utilized deep learning for PCG mapping in Shouguang, China and achieved good performance.

However, even in the era of deep learning, the accurate separation between PCGs and PMFs has still been neglected. To tackle this issue, we introduce the second hypothesis. Could deep learning could yield a high performance for the segmentation of both PCGs and PMFs? To justify the hypothesis, several research questions are listed as follows. First, how to design a robust encoder that could learn the discriminative features of PCGs and PMFs? Second, as most PCGs and PMFs have sharp edges, how to restore these edges in the model’s predictions hence increasing the mapping performance? Besides, how to increase the generalization capability among different regions under the situation of a limited sample remains an important yet open question.

To address the above questions, we proposed a framework by using a deep semantic segmentation model for APC mapping from VHR images and tested the model’s transferability. Specifically, the contribution of this study is as follows. First, we justified the feasibility of our previous DNCNN (dilated and non-local convolutional neural network) as the encoder for spatial feature learning from VHR images. Second, we designed a multi-task decoder to improve the APC segmentation performance at boundaries assisted by the task of edge-aware learning. Third, we have exploited an effective transfer learning method to improve the model’s generalization capability across the Eurasian continent. Finally, the sample dataset, CAU-APC-SEG, is publicly available.

2. Study area

To comprehensively validate the performance of our proposed model, we carefully selected four representative study areas spanning the Eurasian continent, including Shenxian in Liaocheng of China, Al-Kharj in Saudi Arabia, Adana in Turkey and Moguer in Spain (). The reason for selecting these four study regions lies in their variability and representativeness. As for variability, the four study regions are situated in different typical climate zones of the Eurasian continent, including temperate monsoon climate (Shenxian of China), continental climate (Al-Kharj of Saudi Arabia) and Mediterranean climate (Adana of Turkey and Moguer of Spain). As for representativeness, all four study areas witness a typical APC landscape in their local regions. Shenxian of China has both PCGs and PMFs, which could represent the APC characteristics of Eastern Asia. Al-Kharj of Saudi Arabia also witnesses both PCGs and PMFs, which could be viewed as a representative of the APC landscape in arid and semi-arid regions of Central Asia. As for Spain and Turkey, the former has large-scale PCGs while the latter has many PMFs, which could reveal the APC characteristics in the western Eurasian continent.

Figure 1. Study areas. (a)∼(d) are maps of four study areas; (e)∼(h) are PCG and PMF images.

Figure 1. Study areas. (a)∼(d) are maps of four study areas; (e)∼(h) are PCG and PMF images.

Study area A is Moguer, which is situated in the southwestern Spanish province of Huelva and experiences a typical Mediterranean climate, with the geographic coordinates of 6°39'4” W to 6°52'20” W and 37°12'26” N to 37°18'46” N. PCGs are predominant in this study area, which exhibits both a smaller size and a more neat spatial pattern when compared to other study areas.

Study area B is in the southern part of Adana, which is located in Adana province, Turkey, also characterized by a typical Mediterranean climate. The geographical range is 34° 59’ 16” E to 35° 10’ 46” E and 36° 37’ 52” N to 36° 46’ 27” N. Study area B is dominated by PFMs, most of which appear in white but exhibits a great spatial variation.

Study area C is in AI-Kharj, which is situated in the east-central part of the Kingdom of Saudi Arabia, belonging to the continental climate. Al-Kharj serves as a modern agricultural oasis for cereals, vegetables and fruits, benefiting from a substantial level of rainfall. The geographical range of study area B is 23° 51’ 30” E to 24° 13’ 21” E and 47° 0’ 57” N to 47° 14’ 23” N. Different from other study regions, both PCGs and PMFs in study area C are surrounded by yellow or red bare soil.

Study area D is in Shenxian, which is situated in Liaocheng City, Shandong province, China. Shenxian is a prominent agricultural food-producing area within the Shandong Province with a temperate monsoon climate. The geographic coordinates range from 115° 29’ 55” E to 115° 52’ 16” E, 36° 24’ 53” N to 36° 40’ 4” N. It witnesses a rather complex distribution of both PCGs and PMFs, where they are always crisscrossed with other.

3. Material and methods

3.1. VHR remote sensing image collection

Google Earth (GE) images are used with a very high resolution of 1.2 m/pixel. Actually, the GE image consists of only three bands (i.e. red-greenblue, RGB). Although the near-infrared band is missing when compared to other VHR satellites such as IKONOS, it has nearly no impact on the separation of PCG and PMF since the proposed model mainly relies on spatial features other than spectral signatures for classification. Meanwhile, several previous studies also verified the effectiveness of GE-RGB data for APC mapping (Chen et al. Citation2021; Feng, Niu, Yao, et al. Citation2021).

After the acquisition of the VHR images, they were cropped into a series of image patches with a size of 224 × 224. The reason why 224 × 224 is used is its prevalence in the computer vision field. Several deep neural networks take 224 × 224 as a standard input size, such as VGG, ResNet, etc.

3.2. Classification scheme

LabelImg (https://github.com/tzutalin/labelImg) is an open-source image labeling tool. It was used for labeling the following three categories, PCG, PMF and else. All the boundaries of both PCG and PMF were carefully delineated.

After labeling, we randomly split the images into a training set and a test set with a ratio of 7:3 (). The former is used for deep semantic segmentation model training and the latter is for accuracy assessment (http://doi.org/10.6084/m9.figshare.22722925). shows that Shenxian of China has the largest samples. Therefore, Shenxian is selected as the source domain while other study areas are the target domains in the model transfer experiment. More details are in Section 2.3.3 Model transfer across study regions ().

Figure 2. Remote sensing images and labels of APCs for each study area. (a) Shenxian of China; (b) Al-Khar of Saudi; (c) Adana of Turkey; (d) Moguer of Spain.

Figure 2. Remote sensing images and labels of APCs for each study area. (a) Shenxian of China; (b) Al-Khar of Saudi; (c) Adana of Turkey; (d) Moguer of Spain.

Table 1. Sample scheme for each study area.

3.3. Structure of APC-Net

This section will describe the detailed structure of the proposed APC-Net (https://github.com/MrSuperNiu/APCNet), including the multi-scale encoder, multi-task decoder, model transfer design, etc.

illustrates that APC-Net has an encoder-decoder structure. The input to APC-Net is an image patch with a size of 224 × 224 while the output is the per-pixel segmentation results. The DNCNN proposed in our previous work (Feng, Niu, Chen, et al. Citation2021) is utilized as the encoder, which aims to extract multi-scale and hierarchical features from remote-sensing images. As for the decoder, we learn from UNet (Ronneberger, Fischer, and Brox Citation2015) and also add skip connections for feature reuse and enhancement, aiming to increase the locating precision of the segmentation results. Meanwhile, to further refine the predictions on the APC’s boundaries, we design a multi-task strategy at the end of the decoder, where the task of edge-aware learning would help refine the semantic segmentation task.

Figure 3. Structure of the APC-Net.

Figure 3. Structure of the APC-Net.

3.3.1. DNCNN as encoder

The role of an encoder in a semantic segmentation model is to learn discriminative features hence increasing the inter-class separability. Considering that our previous study has already designed an effective backbone neural network, i.e. DNCNN, for patch-based PCG and PMF classification, we are highly motivated to extend its use to semantic segmentation. Therefore, DNCNN is selected as the encoder of APC-Net. depicts the detailed structure of DNCNN, including a series of multi-scale dilated convolutional networks (MDCNs) and a non-local module. The former is for multi-scale local feature extraction to tackle the scale variations of PCGs and PMFs, while the latter is for global contextual feature modeling to further increase the capability of global scene understanding.

Figure 4. Structure of the DNCNN.

Figure 4. Structure of the DNCNN.

3.3.2. Multi-task decoder

The role of the decoder is to restore the encoded features to the per-pixel segmentation results. To further refine the segmentation performance on the boundaries, we have designed a multi-task decoder which includes a mainframe task (i.e. semantic segmentation of APC) and an auxiliary task (i.e. edge extraction of APC).

The objective of the auxiliary task is to delineate the boundaries of both PCG and PMF from the remote sensing patch. The ground truth (GT) data (i.e. the edges of APC) are generated using the classical Canny contour detection method. shows that the GT images have very clear and accurate boundaries of both PCGs and PMFs.

Figure 5. (a) Image patch; (b) mask; (c) boundary.

Figure 5. (a) Image patch; (b) mask; (c) boundary.

The detailed structure of the multi-task learning part is depicted as follows. As in , the module in the red dashed box is the edge monitoring module, which consists of an upsampling layer, two convolutional layers and a classification layer, which calculates the loss value between the predicted boundary and the true boundary by the Lssim loss function to assist the segmentation task in obtaining more accurate boundary information.

The rationality behind multi-task learning is as follows. If using semantic segmentation as the only task, the model would be given similar attention or even less attention to the predictions with the APC boundaries. This is because the inner pixels account for a much larger ratio than that of the boundary pixels. Under this circumstance, the accuracy of the boundaries would have little effect on the model’s overall performance. Therefore, we add edge-aware learning as an auxiliary task to force the model to pay more attention to those boundary pixels, which would increase the final visual effect of predictions.

3.3.3. Model transfer across study regions

Transfer learning aims to enhance the performance of target domains by leveraging knowledge derived from related source domains. Among the various transfer learning strategies, pretrain fine-tuning is one of the most widely employed approaches. This method is intuitive and data-driven as it initially trains a pre-trained model with robust generalization capabilities on a large-scale dataset, and then fine-tuning for a downstream task (Jiang et al. Citation2022; Yamada and Otani Citation2022; Zhang and Gao Citation2022).

The fundamental objective of the pre-trained fine-tuning method is to find the parameter information from both the source and target domains, promoting information migration between these domains. This approach relies on the underlying assumption that both source and target domain data can share certain model parameters.

More precisely, during model parameter transfer, the pre-training weights for the target domain model are loaded while fixing the parameters of the first few layers. Subsequently, only the later layers are fine-tuned to fit the current task. This approach can not only reduce the training time but also obviate the need to commence training the network from scratch. Furthermore, it enables the source domain model to impart valuable prior knowledge to the target domain model, thereby enhancing the target model’s robustness and generalization capabilities (Guo et al. Citation2019). The L2-SP method (Li, Grandvalet, and Davoine Citation2018) further optimizes fine-tuning performance by constraining the weight’s starting points of both target and source domain models, effectively controlling the distance between them.

In this study, we selected China as the source domain primarily due to the abundance of training samples available, where Shenxian of China witnesses a greater number of both PCGs and PMFs than other study areas. Besides, the APC spatial distribution patterns within Shenxian are characterized by a greater diversity and complexity, which shows a larger amount of information that could be learned and then transferred to target domains.

Furthermore, both the source model and the target model are initialized with pre-trained weights, except for the decoder layers. The parameters within the source model are fixed, while those in the target model are trainable. This approach facilitates the alignment of features in the target model with those in the source model by employing the L2-SP transfer learning strategy.

Specifically, the L2-SP regularization is a state-of-the-art regulation method for pre-trained fine-tuning transfer learning strategy, which incorporates an L2 penalty to provide the fine-tuned network with a clear inductive bias aligned with the pre-trained model. It is more effective than the traditional L2 regularization method and freezing layers strategies (Guo et al. Citation2019; Li, Grandvalet, and Davoine Citation2018). Therefore, L2-SP is utilized as the transfer learning method in this study, which calculates the loss between the weights of the source domain and the target domain to improve the classification accuracy of the target domain. The penalty function of L2-SP regularization is as follows: (1) Ω(ω)=α2ωsωs022+β2ωs¯22,(1) where Ω(ω) is the regularizer, ω and ω0 are the network parameters of the target and source domain tasks, respectively. ωs is the part of the target network that shares weights with the source network structure, ωs¯ for the others. α and β are the hyperparameters of the penalty function.

3.4. Training details

This section will describe training details, including the design of a hybrid loss function, and the configuration of several hyper-parameters. In specific, a hybrid loss is designed for PCG and PMF segmentation, consisting of both cross entropy (CE) loss, intersection-of-union (IoU) loss and structure similarity index measure (SSIM) loss (Niu et al. Citation2022; Qin et al. Citation2019). Therein, CE loss and IoU loss are widely used in semantic segmentation tasks to measure the degree of deviation between ground truth and predictions, whose formulas are as follows: (2) Lossce=icyi,clog(pi,c),(2) where y is the ground truth, p is the prediction result, i is the i-th pixel and c denotes the c-th class. (3) Lossiou=i(1PiGTiPiGTi),(3) where P and GT denote the prediction and ground truth of the model, respectively, and i stands for the ith pixel.

Meanwhile, since we have added an edge-aware learning auxiliary task, the edge loss should also be involved. Considering that SSIM is a good index for describing the global structure of an image and is very sensitive to edges and boundaries, it is utilized as edge loss in this study. Another merit of SSIM loss is that it could optimize the boundaries even when CE loss and IoU loss are flattened. The speckle noise could also be depressed due to the use of SSIM. The formula of SSIM is as follows: (4) Lssim=1(2μxμy+C1)(2σxy+C2)(μx2+μy2+C1)(σx2+σy2+C2),(4) where x, y represent the predicted and true labels, respectively. μx, μy and σx, σy represent the mean and deviations of x and y, σxy is their covariance, C1 and C2 are minimal values to prevent division by zero.

The total loss function for APC-Net would be the integration of CE loss, IoU loss and SSIM loss, which is as follows: (5) Lhybrid=Lce+Liou+Lssim.(5) Meanwhile, the framework of the deep neural network is Pytorch 1.10.0. the GPU is NVIDIA GeForce RTX 2080ti with 11 GB memory, the CPU is Intel Core [email protected] GHz, and the operating system is Ubuntu 20.04. The model’s optimizer is Adam, with an initial learning rate 1e-4.

3.5. Accuracy evaluation

Two kinds of methods are utilized for accuracy evaluation. The first is based on visual inspection to evaluate the visual effect of the segmentation results. The second is to calculate several accuracy metrics from testing dataset, including mIoU, overall accuracy (OA) and Kappa index. The formula of mIoU is as follows: (6) mIoU=1Ni=1NPiGTiPiGTi,(6) where Pi denotes the model prediction result of the i th class, and GTi denotes the ground truth of the i th class. The intersection of Pi and GTi is the IoU of class i. The mean value of IoU of all classes is the mIoU of the study area. mIoU ranges from 0 to 1, and the closer to 1, the closer the prediction result is to the true result, and the better the classification effect.

OA equals the ratio of all the correctly predicted pixels to the total number of pixels. Kappa index is widely used to measure the consistency of predictions and GT values.

3.6. Experiment flowchart

As depicted in , first, we obtained and pre-processed very high-resolution Google Earth imagery for each study area. Subsequently, the remote sensing images were cut into small patches and then carefully annotated. Afterwards, we subdivided the labeled samples into source and target datasets. Next, we constructed the APC-Net, incorporating a DNCNN encoder and a multi-task decoder. As for model training, the L2-SP transfer learning strategy, the Adam optimizer and a hybrid loss function were all employed to fine-tune the APC-Net. Additionally, we conducted comparative assessments against several classic semantic segmentation models and subsequently presented the APC mapping results for each study area. Lastly, we performed rigorous validation of APC-Net through ablation experiments to justify the role of each proposed module.

Figure 6. Flowchart of this study. Note. S represents the source domain and T stands for the target domain.

Figure 6. Flowchart of this study. Note. S represents the source domain and T stands for the target domain.

4. Results

4.1. Mapping results of PCGs and PMFs

This section will illustrate the mapping results of both PCGs and PMFs in various study regions. Several predictions on test datasets are also given to better evaluate the segmentation performance.

indicates that the proposed model has shown good performance for the separation of both PCGs and PMFs in all the study regions. The predicted distribution map is in accordance with ground truth, where no obvious classification errors are witnessed. In addition, several predictions on test dataset are also given, including both PCG-dominant, PMF-dominant and PCG-PMF co-existence regions. The proposed model has yielded good performance when inspected from the detailed maps, which justifies its capacity in predicting APCs under various circumstances. Besides, shows that the predicted PCGs and PMFs have very neat boundaries and very few speckle noises. Above all, the qualitative evaluation verifies that APC-Net could separate PCG and PMF from each other across the Eurasian continent.

Figure 7. APC mapping results for (a) Shenxian of China; (b) Al-Kharj of Saudi Arabia; (c) Adana of Turkey; (d) Moguer of Spain.

Figure 7. APC mapping results for (a) Shenxian of China; (b) Al-Kharj of Saudi Arabia; (c) Adana of Turkey; (d) Moguer of Spain.

4.2. Accuracy evaluation results

This section will show the quantitative and qualitative accuracy evaluation results calculated from testing dataset, including mIoU, OA and Kappa index.

indicates that the proposed APC-Net has shown good performance in all these four study areas, with an average mIoU of 0.8377, OA of 94.49% and Kappa index of 0.8933, respectively. Shenxian witnesses the highest mapping accuracy with an mIoU of 0.9064. One possible reason is that Shenxian has a larger number of training samples than other study regions. The data-hunger nature of deep learning makes the model overfit with fewer samples, showing a relatively lower accuracy.

Table 2. Accuracy evaluation results.

4.3. Ablation analysis

To assess the efficacy of the proposed APC-Net model, we carried out several ablation studies, with the results presented below.

4.3.1. Results of DNCNN as an encoder in APC-Net

In this section, we will further justify the performance of DNCNN as encoder for APC mapping. As mentioned before, DNCNN is proposed in our previous study and show good performance in patch-based PCG and PMF classification. However, its role as an encoder for semantic segmentation of APCs should be further verified. A series of experiments have been made to compare the APC mapping accuracy from different encoders (). indicates that the proposed DNCNN encoder used in this study outperforms both Plain CNN and MDCN, which increases the Average mIOU by 0.469 and 0.112, respectively. Further elaboration and discussion are provided in Section 4.2.

Table 3. Comparison between different encoders.

4.3.2. Results of multi-task learning for APC-Net

This section will verify the effectiveness of multi-task learning strategy (i.e. semantic segmentation task and edge-aware learning task) for APC mapping. An ablation study has been performed to show both the accuracy and mapping details. We evaluate the APC mapping results from both semantic segmentation task and the edge-aware learning task. The comparison results for these two tasks are as follows.

indicates that with the inclusion of edge-aware learning as an auxiliary task, the classification accuracy has only increased a little bit for all the study regions. This is predictable since the edges only account for a small number of all pixels. Therefore, the increase of correctly predicted edge pixels could not significantly enhance the overall performance of the APC semantic segmentation results. However, the auxiliary task for edge-aware learning is more important in refining the predicted boundaries of both PCGs and PMFs.

Table 4. Accuracy for mask predictions with and without multi-task decoders.

To further justify the role of a multi-task decoder in boundary refinement, we have also performed an ablation study to directly predict boundaries using both single-task decoder and multi-task decoder. shows the mIoU, OA and Kappa of the boundary predictions with and without edge-aware learning.

Table 5. Accuracy for boundary predictions with and without multi-task decoders.

indicates that for the two-category (i.e. boundary and background) classification, the inclusion of edge-aware learning could increase the boundary prediction accuracy when compared with a single-task decoder in all the four study areas.

4.3.3. Results of model’s generalization capability

This section will further verify the model’s generalization capability based on a series of comparison experiments. indicates that after the model transfer, the performance for APC semantic segmentation has been improved, with an average increase of 4.87%, 1.71% and 3.41% for mIoU, OA and Kappa index, respectively. A thorough discussion will be provided in Section 4.

Table 6. Accuracy comparison of model transfers.

4.3.4. Results of comparison with other methods

This section will compare the proposed APC-Net with other popular semantic segmentation models to further verify its performance in PCG and PMF mapping, including SegNet (Badrinarayanan, Kendall, and Cipolla Citation2016), UNet (Ronneberger, Fischer, and Brox Citation2015), SETR (Zheng et al. Citation2021) and TransUNet (Chen et al. Citation2021). indicates that the APC-Net in this study shows a higher accuracy than other semantic segmentation models with an increase of average mIoU of 5.39%∼14.64%. A detailed discussion of this will follow in Section 4.4.

Table 7. Accuracy comparison of different models.

5. Discussion

5.1. The role of DNCNN as an encoder in APC-Net

In this section, we will discuss the effectiveness of DNCNN as an encoder in APC-Net. shows the accuracy achieved by various encoders. Therein, plain CNN utilizes only consecutive convolutional layers (i.e. UNet) as encoder, and serves as the basic baseline model. As for MDCN, it replaces some of convolutional layers of plain CNN with the MDCN modules, thereby it could enhance the ability of learning multi-scale information. DNCNN is our encoder, which adds the non-local modules (Cao et al. Citation2020) at the end of the encoder to increase the global feature learning capability. The results indicate that DNCNN has shown the highest performance with an average mIoU of 0.8307, which gains a 1.11%∼4.69% increase when compared with other encoders. Plain CNN shows the lowest accuracy while with the addition of MDCN and non-local layers, the accuracy has been improved, which could justify the role of DNCNN in APC semantic segmentation.

5.2. The effectiveness of multi-task learning for APC-Net

In this section, we will show some cases to further discuss the role of multi-task learning in APC mapping, as shown in .

Figure 8. The comparison of mapping results for multi-task learning strategy (a) image patch; (b) GT; (c) predictions with edge-aware learning; (d) predictions without edge-aware learning.

Figure 8. The comparison of mapping results for multi-task learning strategy (a) image patch; (b) GT; (c) predictions with edge-aware learning; (d) predictions without edge-aware learning.

indicates that without the edge-aware learning task, the predicted boundaries are irregular rather than neat and straight when compared with GT masks. Besides, several thin boundaries could be delineated correctly. After the inclusion of the auxiliary task for edge learning, the predicted masks now show very sharp boundaries with better mapping result.

To better analyze the role of edge-aware learning, compares the boundary predications with and without a multi-task decoder. It indicates that if we use only the semantic segmentation decoder, the boundary predictions would witness lots of errors ((d)). Especially for regions where PCGs and PMFs are densely distributed, many inner boundaries could not be delineated, which would lower the mapping performance. On the other hand, with the help of an edge-aware learning decoder, now it is possible to detect every boundary accurately.

5.3. Verification of the model’s generalization capability

To further justify the role of transfer learning in increasing the model’s generalization ability, an extra experiment was performed where we gradually increased the number of samples in the target domain to illustrate how the transfer learning improves the model accuracy. As depicted in , the mIoU curves in the three target domains reveal a noticeable trend. When no target domain samples are involved, the mIoU is quite low, indicating that model’s generalization ability is not high. However, with the inclusion of target domain samples and the help of transfer learning, the APC mapping accuracy shows a great improvement, which further justifies the role of transfer learning in large-scale APC mapping applications and the significance in improving the model’s spatial generalization capability.

Figure 9. The mIoU trend regarding the model’s generalization.

Figure 9. The mIoU trend regarding the model’s generalization.

5.4. Comparison with other methods

In this section, we will detail the discussion between the proposed APC-Net with other classic semantic segmentation models. Specifically, illustrates that APC-Net is outperform other semantic segmentation models. Compared with SegNet, UNet, SETR and TransUNet, the proposed model could improve the average mIoU by 0.0889, 0.0539, 0.1464, 0.0603, respectively. illustrates several detailed APC predictions on test dataset by various models. It also indicates that the proposed APC-Net show a better mapping performance, especially for the boundaries. One possible reason is that the classical semantic segmentation models do not consider the spatial characteristics of PCGs and PMFs, which show a relatively lower accuracy than the proposed APC-Net. On the other hand, APC-Net utilizes the DNCNN to extract hierarchical and discriminative features, multi-task decoder to refine boundaries and SSIM loss function to constrain the boundary, which is more suitable for the task of APC mapping from VHR images.

Figure 10. Detailed APC mapping results for different models.

Figure 10. Detailed APC mapping results for different models.

5.5. Bad case analysis

In this section, we would analyze the prediction errors obtained by the proposed model. As depicted in , circles in three different colors represent three kinds of mistakes. The red circle shows cases where PCGs are misclassified as PMFs, the blue circle signifies PMFs that are misclassified as PCGs, while the yellow circle stands for other land covers that are misclassified into PMFs. Both (a), c and f show that in regions where PCGs are densely connected, it would be possible that some small regions within PCGs are misclassified into PMFs. The reason might be that these small regions show very similar appearances with PMFs, making it challenging to separate PCGs with PMFs under this scenario. On the contrary, there are much less misclassifications of PMFs into PCGs ((e,f)). This is because most PMFs have flattened surfaces rather than the stereo structures of PCGs. Another mistake is that bare soils are predicted as PMFs ((a,b,d)). The reason is that some bare soils show a high reflectance and thus have a similar appearance with PMFs. To reduce such errors, future studies would consider multispectral imagery rather than the RGB data, where bare soils would show a different spectral curve that could be easily separated from PMFs.

Figure 11. Mapping results for bad cases.

Figure 11. Mapping results for bad cases.

5.6. Uncertainty and limitations

This section will discuss the uncertainties and limitations of the entire pipeline of PCG and PMF mapping. Firstly, from the perspective of duration, compared with PCG, PMF could only be observed by satellites within a very short period (e.g. one month in Shenxian, China), which is from the laying of plastic mulches to the emergence of crops. Afterwards, with the growth of crops, the plastic mulches would be covered by leaves and branches when observed from space, making it very difficult to separate them other uncovered farmlands. On the other hand, PCGs could always be viewed by satellites in cloudless weather, which have fewer restrictions to imaging time than PMFs. Therefore, the selection of satellite data at the right time (i.e. film-on) is of great importance. Local agronomy knowledge, especially the time for laying plastic mulches and the crop types, is very valuable in large-scale PMF mapping.

Secondly, from the perspective of remote sensing data used, only RGB images from Google Earth are adopted, which lacks spectral features, especially in the near-infrared bands. However, the results show that even with limited spectral bands such as RGB, it is still feasible to separate PCGs from PMFs accurately. Future studies would consider multi-spectral data such as WorldView and Pleiades, etc.

Despite the above limitations, this study contributes to a reasonable and practicable method to tackle the long-neglected issue of PCG and PMF separation. In the near future, we are highly motivated to utilize the proposed method for large-scale PCG and PMF mapping in China and abroad to provide high-quality APC distribution datasets for scientists in the fields such as agriculture, remote sensing, environment and land science.

6. Conclusions

This paper proposed a deep learning-based semantic segmentation model, APC-Net, for the accurate separation of both PCGs and PMFs, which has long been neglected in the field of APC monitoring. The proposed APC-Net could be viewed as a semantic segmentation version of our previous model DNCNN, and we have also designed a multi-task decoder and a hybrid loss function to further refine the boundaries of both PCGs and PMFs. A new transfer learning strategy has also been exploited in this study to further improve the model’s spatial transferability in several typical regions across the Eurasian continent. Experimental results in China, Saudi Arabia, Turkey and Spain indicate that the proposed APC-Net has shown good performance in PCG and PMF mapping, with a mean OA of 94.49% and an mIoU of 0.8377, respectively. The DNCNN could increase the average mIoU by 5.39% when compared with plain CNN such as UNet, while the multi-task encoder could yield neat and tidy boundaries of PCGs and PMFs. Meanwhile, the transfer learning strategy could also further improve the mapping accuracy in regions with limited samples. APC-Net also outperforms several classical deep semantic segmentation models. Future study mainly includes the extension of this model for large-scale APC mapping, aiming to provide country or continental scale datasets for APC distribution.

Code availability statement

The code in this paper, APC-Net, is available at https://github.com/MrSuperNiu/APCNet.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The dataset in this paper, CAU-APC-SEG, is publicly available at http://doi.org/10.6084/m9.figshare.22722925.

Additional information

Funding

This work was supported by the National Key Research and Development Program of China: [Grant Number 2022YFB3903504]; National Natural Science Foundation of China: [Grant Number 42001367].

References

  • Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. 2016. “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.” IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (12): 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
  • Cao, Yue, Jiarui Xu, Stephen Lin, Fangyun Wei, and Han Hu. 2020. “Global Context Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (6): 6881–6895. https://doi.org/10.1109/TPAMI.2020.3047209
  • Chang, Jie, Xu Wu, Yan Wang, Laura A. Meyerson, Baojing Gu, Yong Min, Hui Xue, Changhui Peng, and Ying Ge. 2013. “Does Growing Vegetables in Plastic Greenhouses Enhance Regional Ecosystem Services Beyond the Food Supply?” Frontiers in Ecology and the Environment 11 (1): 43–49. https://doi.org/10.1890/100223
  • Chen, Boan, Quanlong Feng, Bowen Niu, Fengqin Yan, Bingbo Gao, Jianyu Yang, Jianhua Gong, and Jiantao Liu. 2021. “Multi-modal Fusion of Satellite and Street-View Images for Urban Village Classification Based on a Dual-Branch Deep Neural Network.” International Journal of Applied Earth Observation and Geoinformation 109:102794. https://doi.org/10.1016/j.jag.2022.102794
  • Chen, Jieneng, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Le Lu Yan Wang, Alan L. Yuille, and Yuyin Zhou. 2021. “TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation.” http://arxiv.org/abs/2102.04306.
  • Chen, Wei, Yameng Xu, Zhe Zhang, Lan Yang, Xubin Pan, and Zhe Jia. 2021. “Mapping Agricultural Plastic Greenhouses Using Google Earth Images and Deep Learning.” Computers and Electronics in Agriculture 191:106552. https://doi.org/10.1016/j.compag.2021.106552
  • Feng, Quanlong, Bowen Niu, Boan Chen, Yan Ren, Dehai Zhu, Jianyu Yang, Jiantao Liu, Cong Ou, and Baoguo Li. 2021. “Mapping of Plastic Greenhouses and Mulching Films from Very High Resolution Remote Sensing Imagery Based on a Dilated and non-Local Convolutional Neural Network.” International Journal of Applied Earth Observation and Geoinformation 102:102441. https://doi.org/10.1016/j.jag.2021.102441
  • Feng, Quanlong, Bowen Niu, Dehai Zhu, Xiaochuang Yao, Yiming Liu, Cong Ou, Boan Chen, Jianyu Yang, and Jiantao Liu. 2021. “A Dataset of Remote Sensing-Based Classification for Agricultural Plastic Greenhouses in China in 2019.” China Scientific Data 6 (4), http://www.csdata.org/en/p/671.
  • Guo, Yunhui, Honghui Shi, Abhishek Kumar, Kristen Grauman, Tajana Rosing, and Rogerio Feris. 2019. “SpotTune: Transfer Learning Through Adaptive Fine-Tuning.” 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 4800–4809, https://ieeexplore.ieee.org/document/8953749.
  • Hao, Pengyu, Zhongxin Chen, Huajun Tang, Dandan Li, and He Li. 2019. “New Workflow of Plastic-Mulched Farmland Mapping Using Multi-Temporal Sentinel-2 Data.” Remote Sensing 11 (11): 1353. https://doi.org/10.3390/rs11111353
  • Ji, Li, Lianpeng Zhang, Yang Shen, Xing Li, Wei Liu, Qi Chai, Rui Zhang, and Dan Chen. 2020. “Object-Based Mapping of Plastic Greenhouses with Scattered Distribution in Complex Land Cover Using Landsat 8 OLI Images: A Case Study in Xuzhou, China.” Journal of the Indian Society of Remote Sensing 48 (2): 287–303. https://doi.org/10.1007/s12524-019-01081-8
  • Jiang, Junguang, Yang Shu, Jianmin Wang, and Mingsheng Long. 2022. “Transferability in Deep Learning: A Survey.” https://arxiv.org/pdf/2201.05867.pdf.
  • Jiménez-Lao, Rafael, Fernando J. Aguilar, Abderrahim Nemmaoui, and Manuel A. Aguilar. 2020. “Remote Sensing of Agricultural Greenhouses and Plastic-Mulched Farmland: An Analysis of Worldwide Research.” Remote Sensing 12 (16): 2649. https://doi.org/10.3390/rs12162649
  • Li, Hongzhou, Yuhang Gan, Yujie Wu, and Li Guo. 2022. “EAGNet: A Method for Automatic Extraction of Agricultural Greenhouses from High Spatial Resolution Remote Sensing Images Based on Hybrid Multi-Attention.” Computers and Electronics in Agriculture 202:107431. https://doi.org/10.1016/j.compag.2022.107431
  • Li, Xuhong, Yves Grandvalet, and Franck Davoine. 2018. “Explicit Inductive Bias for Transfer Learning with Convolutional Networks.” http://arxiv.org/abs/1802.01483.
  • Lu, Lizhen, Yuan Tao, and Liping Di. 2018. “Object-Based Plastic-Mulched Landcover Extraction Using Integrated Sentinel-1 and Sentinel-2 Data.” Remote Sensing 10 (11): 1820. https://doi.org/10.3390/rs10111820
  • Ma, Ailong, Dingyuan Chen, Yanfei Zhong, Zhuo Zheng, and Liangpei Zhang. 2021. “National-Scale Greenhouse Mapping for High Spatial Resolution Remote Sensing Imagery Using a Dense Object Dual-Task Deep Learning Framework: A Case Study of China.” ISPRS Journal of Photogrammetry and Remote Sensing 181:279–294. https://doi.org/10.1016/j.isprsjprs.2021.08.024
  • Niu, Bowen, Quanlong Feng, Boan Chen, Cong Ou, Yiming Liu, and Jianyu Yang. 2022. “HSI-TransUNet: A Transformer Based Semantic Segmentation Model for Crop Mapping from UAV Hyperspectral Imagery.” Computers and Electronics in Agriculture 201:107297. https://doi.org/10.1016/j.compag.2022.107297
  • Novelli, Antonio, Manuel A. Aguilar, Abderrahim Nemmaoui, Fernando J. Aguilar, and Eufemia Tarantino. 2016. “Performance Evaluation of Object Based Greenhouse Detection from Sentinel-2 MSI and Landsat 8 OLI Data: A Case Study from Almería (Spain).” International Journal of Applied Earth Observation and Geoinformation 52:403–411. https://doi.org/10.1016/j.jag.2016.07.011
  • Ou, Cong, Jianyu Yang, Zhenrong Du, Yiming Liu, Quanlong Feng, and Dehai Zhu. 2019. “Long-Term Mapping of a Greenhouse in a Typical Protected Agricultural Region Using Landsat Imagery and the Google Earth Engine.” Remote Sensing 12 (1): 55. https://doi.org/10.3390/rs12010055
  • Qin, Xuebin, Zichen Zhang, Chenyang Huang, Chao Gao, Masood Dehghan, and Martin Jagersand. 2019. “BASNet: Boundary-Aware Salient Object Detection.” 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 7471–7481. https://ieeexplore.ieee.org/document/8953756/.
  • Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. 2015. “U-Net: Convolutional Networks for Biomedical Image Segmentation.” Medical Image Computing and Computer-Assisted Intervention 9351:234–241. https://doi.org/10.1007/978-3-319-24574-4_28.
  • Wu, Chanfan, Jinsong Deng, Ke Wang, Ligang Ma, and Tahmassebi Amir Reza Shah. 2016. “Object-based Classification Approach for Greenhouse Mapping Using Landsat-8 Imagery.” International Journal of Agricultural and Biological Engineering 9 (1): 79–88. https://doi.org/10.3965/j.ijabe.20160901.1414.
  • Yamada, Y., and M. Otani. 2022 . “Does Robustness on ImageNet Transfer to Downstream Tasks?” 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 9205–9214. https://ieeexplore.ieee.org/document/9878666.
  • Yang, Dedi, Jin Chen, Yuan Zhou, Xiang Chen, Xuehong Chen, and Xin Cao. 2017. “Mapping Plastic Greenhouse with Medium Spatial Resolution Satellite Data: Development of a new Spectral Index.” ISPRS Journal of Photogrammetry and Remote Sensing 128:47–60. https://doi.org/10.1016/j.isprsjprs.2017.03.002
  • Zhang, Xiaoping, Bo Cheng, Jinfen Chen, and Chenbin Liang. 2021. “High-Resolution Boundary Refined Convolutional Neural Network for Automatic Agricultural Greenhouses Extraction from GaoFen-2 Satellite Imageries.” Remote Sensing 13 (21): 4237. https://doi.org/10.3390/rs13214237
  • Zhang, Lei, and Xinbo Gao. 2022. “Transfer Adaptation Learning: A Decade Survey.” IEEE Transactions on Neural Networks and Learning Systems, 1–22. https://ieeexplore.ieee.org/document/9802910.
  • Zheng, Sixiao, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, et al. 2021. “Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.” 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 6877–6886. https://ieeexplore.ieee.org/document/9578646.