650
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Histogram matching-based semantic segmentation model for crop classification with Sentinel-2 satellite imagery

, , , , &
Article: 2281142 | Received 28 Apr 2023, Accepted 03 Nov 2023, Published online: 16 Nov 2023

ABSTRACT

Accurate and near-real-time crop mapping from satellite imagery is crucial for agricultural monitoring. However, the seasonal nature of crops makes it challenging to rely on traditional machine learning methods and previous samples generated within specific domains. In this study, we improved the histogram matching method for color correction of multi-temporal images and tested the performance and prediction classification accuracy of three semantic segmentation models based on weak samples. Classification experiments were conducted for nine categories in two cities in Henan province from 2019 to 2022 using 10 m resolution Sentinel-2 images with different feature selection schemes. We trained the models using classified and recorrected results in four selected sites in 2019 and 2020, and designed experiments to assess the performance of the improved histogram matching method and verify the transferability of semantic segmentation models across regions and years. The experimental results showed that the UNet++ model with feature selection and improved histogram matching methods outperformed other models, such as DeepLab V3+ and UNet, in crop classification transfer cases, with better model performance and higher classification accuracy. The UNet++ model without training samples achieved optimal overall accuracy, Kappa coefficient, and mean F1-score values from 2019 to 2022, exceeding 87%, 82%, and 65%, respectively. Moreover, the representative error of weak samples and prediction classification results were analyzed to improve the model robustness. As an application of transfer-learning in crop mapping, the proposed model effectively addressed the classification problem of multispectral satellite imagery with missing labels.

1. Introduction

Agricultural practices and food security are among the main issues of global concern (e.g. Food and Agriculture Organization). Due to the seasonal characteristics, the ground data of crops had vital timeliness. Obtaining the spatiotemporal distribution of crops based on a few available ground samples is of great significance for agricultural monitoring and yield estimation (Wang et al. Citation2022a). Remote sensing (RS) satellites have provided multi-sensor and multi-scale data sources for target recognition and land-use classification. Combining appropriate classifiers can serve near-real-time or long time-series vegetation monitoring tasks. Zhang et al. (Citation2022) detected the forest damage based on Sentinel-2 images and UNet++ network to formulate proactive prevention and control strategies. Wang et al. (Citation2022b) used the Gaofen and Sentinel-2 multispectral satellite images to classify crops in complex agricultural areas, providing valuable data for agricultural management. Song et al. (Citation2021) evaluated the applicability of Landsat, Sentinel, and the Moderate Resolution Imaging Spectroradiometer in national-scale crop mapping. Particularly, Sentinel-2 satellites developed by the European Space Agency provide optical images with 13 bands and a fused spatial resolution of 10 m for monitoring agricultural practices, especially crop classification (Martinez et al. Citation2021; Wang et al. Citation2022a).

However, most classification methods need the support of massive ground surveys and samples, while obtaining these data sets under smallholder patterns requires more time and labor costs (Xu et al. Citation2022). Kaiser et al. (Citation2017) highlighted the limitations of deep learning models, noting the heavy dependence on a large amount of labeled data set that is resource-intensive and time-consuming to acquire. Furthermore, deep learning networks performed rather robustly against noise in the training labels. This opened up the intriguing possibility to obtain samples from existing maps or supervised classification results. Therefore, these labeled data sets that are not completely reliable and may even present misclassified pixels can be commonly referred to as weak samples (Hao et al. Citation2020; Kaiser et al. Citation2017; Paris and Bruzzone Citation2021). However, it is essential to establish and evaluate a deep-learning model that can automatically extract detailed crop information from images using a few weak samples (Diakogiannis et al. Citation2020; Khan, Fraz, and Shahzad Citation2021; Li et al. Citation2020). Automating this process can provide near-real-time crop spatial distribution products and enhance the application efficiency of digital agriculture.

Crop classification from RS imagery is challenging, particularly in different topographic regions with widespread and scattered smallholders in China. Furthermore, modern technologies such as soil improvement and interplanting make agricultural activities highly dynamic, resulting in higher requirements for satellite sensors (Martinez et al. Citation2021). Several studies have demonstrated that appropriate classifier and feature selection of satellite images are the critical factors affecting the classification results (Wang et al. Citation2022a; Martinez et al. Citation2021; Xu et al. Citation2022). Furthermore, the three red-edge bands and a narrow near-infrared band of Sentinel-2 data are sensitive to crop classification in complex agricultural areas (Khan, Fraz, and Shahzad Citation2021; Waldner and Diakogiannis Citation2020; Wang et al. Citation2022a; Yuan, Shi, and Gu Citation2021). Compared with Landsat and MODIS satellite data, Song et al. (Citation2021) recognized the red-edge and short-wave infrared bands in Sentinel-2 imagery as more valuable for corn and soybean classification. Wang et al. (Citation2022) employed time-series Sentinel-2 images to map corn, soybeans, and others with an overall accuracy exceeding 85%. Such outcomes demonstrated the significance of the red-edge bands and vegetation indices in classification task. Wang et al. (Citation2021) applied multi-temporal Sentinel-2 images to classify corn, cotton, soybean, winter wheat, and other categories, and highlighted the impact of crop growth period differences on feature selection. Collectively, these studies furnish valuable insights for the successful application of Sentinel-2 images in agricultural monitoring and crop classification.

In data preprocessing, the feature selection strategy plays an essential role in improving classification accuracy from single or multi-temporal RS images. However, the phenological periods or imaging conditions lead to feature differences of multi-temporal or simultaneous images. Color equalization algorithms should adjust such gray level or color differences to keep in a relatively consistent brightness and color range in the application scene. As concluded by Cao et al. (Citation2015), the multispectral image matching methods can be classified into two categories: approaches involving filtering (such as wavelet transformation and Fourier transformation) and statistics (such as moment matching and histogram matching). The histogram matching (HM) method is more popular because of the ability to enhance the contrast of images and correct for nonlinear response detectors (Cao et al. Citation2015). In terms of multi-crop classification, it is necessary to test further the performance of the HM approach in processing multiple features (including spectral, vegetation indices, and texture features) and its impact on the classifier model.

Given the intraclass and interclass diversity of crops with seasons, the quality of the labeled sample data set will be significantly influenced by the mixed pixels or small parcels, and these samples will become ineffective in reference to the past (Paris and Bruzzone Citation2021; Zhu et al. Citation2022). The classification accuracy of RS images using machine learning algorithms or deep learning networks is affected by the quality of training samples. Despite the fact that several countries have published agricultural data, such as the Cropland Data Layer and the Crop Inventory, it is not readily available in China due to the smallholder planting mode (Wang et al. Citation2022b). The crop samples will change with seasons and become ineffective, referring to the past. Kaiser et al. (Citation2017) used the OpenStreetMap weak samples to extract buildings and roads and pointed out that high-precision models can be trained using weak samples with some errors. Hao et al. (Citation2020) selected the Cropland Data Layer and Sentinel-2 images as a training data set to identify crops in three test regions, with an overall classification accuracy higher than 86%. Paris and Bruzzone (Citation2019) used weakly labeled samples to train a convolutional neural network (CNN) and demonstrated its effectiveness in land-cover classification. The above studies have proved the applicability of the weakly labeled samples in RS classification without considering the feature differences between multi-temporal images. Similarly, the crop classification results obtained using traditional pixel-based machine learning algorithms (such as support vector machine and random forest) can be regarded as weak samples after processing the “salt and pepper phenomenon.” Combining feature selection and HM methods, it is necessary to further test and analyze the performance of such non-100% accuracy samples in different semantic segmentation networks.

Deep learning has been widely applied in RS image recognition and classification thanks to the effectiveness of deep feature extraction and model generalization of complex data (Perantoni and Bruzzone Citation2022; Wang et al. Citation2022b). Semantic segmentation serves as a critical pixel-level classification method in deep learning networks and has been employed to address RS problems of multi-scale and diverse applications (Waldner and Diakogiannis Citation2020; Wang et al. Citation2021, Citation2022a). The advantage of semantic segmentation-based methods mainly depends on the powerful encoder to extract complex features from a few samples (Wang et al. Citation2022b). Among many semantic segmentation networks, the encoder-decoder architecture is the most commonly used one, mainly because of its powerful multi-level learning capacity, especially in intelligent interpretation of RS images.

Crop mapping is a pixel-wise classification task using RS imagery to generate thematic maps that depict the spatiotemporal distribution of identifiable agricultural planting types (Li et al. Citation2022). Wang et al. (Citation2022a) classified seven summer crops based on Sentinel-2A data sets with 14 feature bands and yielded that the UNet++ model exhibited the optimal performance, with an overall accuracy above 91%, while ignoring the feature differences of images in multiple phenological periods. Martinez et al. (Citation2021) tested a hybrid deep learning architecture upon time series SAR datasets and indicated its advantages in N-to-N crop classification in tropical sites. However, due to climatic factors such as continuous precipitation, obtaining time series optical satellite images in short crop growth is often difficult. Adrian et al. (Citation2021) fused Sentinel SAR and optical images and yielded higher training performance of the UNet model, with an overall accuracy of 0.992, indicating that the semantic segmentation model can improve crop classification accuracy. In addition, the attention mechanism is often introduced to improve the efficiency and quality of deep learning networks due to its effective use of spatial and semantic information of RS images. Several studies have proved that the attention mechanism, such as self-attention and Spatial and Channel Squeeze & Excitation Block (scSE), can effectively segment land cover, crop, road, and water information on the medium- and high-resolution images (Ienco et al. Citation2019; Martinez et al. Citation2021; Qi et al. Citation2020; Xu et al. Citation2020; Zhang et al. Citation2022). Spectral, spatial, and textural features provide more information for crop classification in complex agricultural areas. The semantic segmentation models have shown stronger generalization ability in the intelligent classification of multi-feature RS images. These beneficial studies provide a reference for the experiments of weak samples in different deep learning networks. Especially for semantic segmentation models, the generalization ability and applicability remain very challenging problems. In most cases, the preprocess of feature selection and image matching based on weak samples should be considered to input and test such networks.

This study aims to assess the performance of feature-selection schemes with improved HS methods in different semantic segmentation networks with attention mechanisms based on weak samples. This assessment included (1) comparing the preprocessing performance of the improved HM method in multi-temporal Sentinel-2 images; (2) evaluating the training performance of different schemes in semantic segmentation models based on weak samples; (3) selecting the optimal training model to evaluate the prediction classification accuracy and model generalization ability. Overall, the proposed method combines ideas distilled from image consistency processing and semantic segmentation to demonstrate the applicability performance.

2. Study sites and data

2.1. Study sites

This study focused on the main crop classification in the central part of Henan province, China (). The area belongs to the north warm temperate continental monsoon climate zone with an annual precipitation of about 700 mm and an annual average temperature of ~ 14°C. The annual precipitation mainly occurs from July to September, and the simultaneous heat and precipitation promote crop growth while limiting the acquisition of multi-temporal optical images. The summer crops in this area are diverse, and their growing season is ~110 days from early-June to mid-September. As shown in , four sites each covering ~ 20 × 20 km in Xuchang city were chosen to generate weak samples in 2019 and 2020 as the training data set. The sample plots and validation sample points were used to assess the prediction classification accuracy of the optimal training model. The prediction classification results of both cities from 2019 to 2022 were used to test the model generalization and transfer learning ability.

Figure 1. Geographic location, terrain, samples, and training sites in the central part of Henan province, China.

Figure 1. Geographic location, terrain, samples, and training sites in the central part of Henan province, China.

2.2. Datasets

2.2.1. Sentinel-2 data

Due to the low vegetation cover in the early stage of crops such as soybean and tobacco, we collected Sentinel-2 images in the late growth stage from late-July to early-September yearly to reduce the impact of soil (). The Google Earth Engine (GEE) platform can conveniently obtain high-quality and cloudless images and their vegetation indices and texture features (Adrian, Sagan, and Maimaitijiang Citation2021). The GEE provides calibrated and corrected Sentinel Level-2A atmospheric bottom reflectance data. The sustained precipitation in August made it difficult to obtain cloudless and high-quality images during this period. As listed in , both schemes include 10 and 18 features, respectively. The coastal aerosol, water vapor, and Cirrus bands with 60-m resolution were removed, and the resolution of the preprocessed image was 10 m.

Table 1. Sentinel-2 images from 2019 to 2022 were used in this study. The data follows the U.S. Military grid reference System (US-MGRS) to set the satellite orbit number, such as T49SFT.

Table 2. Sentinel-2 schemes with feature selection. For spectral features, the center wavelength with nm unit is listed.

Currently, machine learning methods are commonly used for feature importance evaluation and selection. Wang et al. (Citation2022b) employed the XGBoost algorithm to obtain 14 features from 38 feature sets to classify crops in complex agricultural areas, which represents a feature importance evaluation method based on the Gradient Boosting Decision Tree. Additionally, the random forest and feature sensitivity analysis were adopted to select feature subsets (Izquierdo-Verdiguier and Zurita-Milla Citation2020; Wang et al. Citation2021). Based on prior knowledge and the role of different vegetation indices in classification, we selected 8 features derived from Sentinel-2 images. Given the varying recognition abilities of feature variables on categories, selecting suitable features through analysis of the training dataset can enhance the explanatory power of the classifier. As listed in , scheme 1 consists of only 10 spectral features of the Sentinel-2 images, whereas scheme 2 augments scheme 1 by incorporating an additional 5 vegetation indices and 3 texture features. Based on the comparison of model performance and classification accuracy, the importance of feature combinations in crop classification in complex agricultural planting areas is tested. In scheme 2, five vegetation indices (detailed in the supplementary material of Chaves et al. (Citation2020)) were selected, which are normalized difference vegetation index (NDVI) used to distinguish vegetation from non-vegetation, modified normalized difference water index (MNDWI) used to recognize water, normalized difference buildup index (NDBI) used to identify buildings, modified soil adjusted vegetation index (MSAVI) used to reduce soil influence, and NDVI red-edge (NDVIRE) used to reflect crop growth. The texture features near-infrared band-based were derived by the gray level co-occurrence matrix in GEE, and the sum average (SAVG), correlation (CORR), and dissimilarity (DISS) were selected in scheme 2 (Wang et al. Citation2022b, Citation2022a).

2.2.2. Sample data

We used the handheld Global Positioning System named Jisibao UG905 to collect ground samples in mid-July every year. Four sample plots were investigated yearly with a total area of 4.11 km2, each covering an area of 1 km2. shows the sample plots in different years. In addition to the sample plots, we obtained the location and type information of typical crops along the way. These samples were used to establish an interpretation mark library for manual sample selection. In this study, we classified crops and land use into 9 categories, which are corn, soybean, peanut, tobacco, non-cultivated land (including harvested and abandoned cropland, NCL), other crops (such as pepper, rehmannia, and vegetables), forest land (including nursery stock planted on cultivated land), urban (including greenhouses, roads, and buildings), and water.

Figure 2. Images, sample plots, and weak labels of different years. (a) sample plot (S1) in 2019; (b) sample plot (S2) in 2020; (c) sample plot (S4) in 2021; (d) sample plot (S2) in 2022; (e) site a in 2019; (f) site B in 2019; (g) site C in 2020; (h) site D in 2020.

Figure 2. Images, sample plots, and weak labels of different years. (a) sample plot (S1) in 2019; (b) sample plot (S2) in 2020; (c) sample plot (S4) in 2021; (d) sample plot (S2) in 2022; (e) site a in 2019; (f) site B in 2019; (g) site C in 2020; (h) site D in 2020.

Based on the ground sample and manually selected sample points, the classification results of four sites () in 2019 and 2020 were accomplished by using the random forest method in GEE. However, the “salt and pepper” problem in classification results made it impossible to input the model as label data directly. Therefore, the recorrecting processing methods, including format conversion, single-pixel elimination, and manual editing, were used to improve the accuracy of weak labels. This process included (1) converting raster classification results (in tif format) to vector data (in shp format); (2) merging the selected single-pixel with the neighboring unselected polygon with the longest shared boundary; (3) manually editing the significant misclassifications; (4) selecting random points to verify the results. These methods were provided by ArcMap, developed by Environmental Systems Research Institute, Inc, located in Redlands, California, USA. We selected 20 random crop sample points for each site every year, and found a total of 13 misclassifications, such as soybean and other crops, greenhouses and non-cultivated land, and peanut and sweet potato. Finally, this kind of non-100% accuracy data shown in was converted to TIF format with the same coordinate system and resolution as the corresponding image, and used as weak labels for model training and performance evaluation.

In addition to the four sample plots (S1-S4 in ), the study area was divided into 5×5 km regular grids. We selected a crop point for each grid and assigned the attribute values to evaluate the prediction classification accuracy and spatiotemporal generalization capability of different semantic segmentation models without training samples. The distribution of validation sample points is shown in . Here, water samples were not included in the validation data set because the water was relatively easier to extract from images, and this study mainly focused on crop classification in complex agricultural areas.

2.2.3. Model data set processing

The initial step is to select the reference image and use histogram matching to process the Sentinel-2 images. Then, these matched images were scaled to 8 bit, and the bands were stretched to satisfy the input requirements of semantic segmentation networks. The data stretching method used in this study is linear stretching, which is provided by ENVI software. Finally, the preprocessed images with a resolution of 10 m and weak labels of four sites in 2019 and 2020 were clipped to the patch size of 256 × 256 pixels using a regular-grid algorithm with zero repetition rate. The remaining matched images in the study area were trimmed similarly and used for model prediction classification. When the remaining image edges are less than 256 pixels, fill with the background value and complete the clipping process. It is worth noting that the clipped images maintained the same spatial position and coordinate system as the original images to quickly mosaic the predicted classification results during postprocessing. A total of 624 pairs of training data sets were obtained in this study. The pixel proportions of corn, peanuts, soybean, tobacco, non-cultivated land, other crops, forest land, urban, and water were 40.95%, 5.20%, 7.89%, 1.18%, 0.68%, 8.03%, 13.36%, 21.81%, and 0.90%, respectively. Due to the lower proportion of tobacco and non-cultivated land, it is necessary to deal with the imbalanced class to reduce the impact on the semantic segmentation models (Wang et al. Citation2022b). Although the proportion of water pixels is also deficient, the broad spatial distribution and significant spectral features meant that upsampling could be avoided.

3. Methodology

3.1. Improved histogram matching method

Histogram matching (HM) can be used as a lightweight normalization for image processing, such as feature matching, especially the images from different regions or phenological periods (Horn and Woodham Citation1979). The solution of the traditional HM algorithm is to match the cumulative probability distribution function of the target detector and reference detector (Cao et al. Citation2015). HM manipulates the pixels of a target image so that its histogram matches the histogram of the reference image. Unlike grayscale or RGB images, RS images usually have more channels and different ranges of feature values for each channel. Therefore, the HM is done independently for each channel (band), as long as the number of channels (bands) is equal in the target image and the reference. The steps of the improved HM algorithm proposed in this paper were as follows:

  1. Even after radiometric calibration and atmospheric correction, satellite images still have noise or feature outliers. In particular, a small number of high reflectivity values are often caused by bright artificial objects or clouds. Therefore, for the histogram results of band i (Bi), we continuously traverse the statistical number of pixels for each reflectance value. If the number of pixels from a certain reflectivity value (rmax) is lower than the given threshold (Tg = 20) for 10 consecutive times, then the reflectivity values of the pixels greater than rmax will be recorrected to rmax.

  2. For the input image, it has a probability density function (EquationEquation (1)).

(1) prk=nkN(1)

where nk is the frequency of the reflectivity value rk in Bi, and N is the total number of pixels. p(rk) is the probability of rk, and its value range is [0, 1].

(3) For the reference image, the probability density function of rk in Bi can easily be mapped to its cumulative distribution function by EquationEquation (2).

(2) Srk=j=0kPrrj,k=0,1,2,,L1(2)

where S(rk) is the cumulative HM distribution of rk in Bi. L is the total number of gray levels within the range of [0,rj].

(4) Similarly, for the input image, the cumulative probability distribution of zk in Bi is determined using EquationEquation (3).

(3) Gzk=j=0kPzzj,k=0,1,2,,L1(3)

where G(zk) is the cumulative HM distribution of rk in Bi.

(5) Finally, using the equation Srk=Gzk, the desired output image and its histogram similar to the reference histogram can be obtained.

In this paper, the performance metrics, including histogram, mean, variance, and mean gradient, are used to compare the image quality before and after HM based on the existing works (Cao et al. Citation2015; Jayasankari and Domnic Citation2020).

3.2. Modeling using different semantic segmentation networks

3.2.1. Semantic segmentation models based on FCN

The fully convolutional network (FCN), employing an encoder-decoder structure, learned features from the multi-channel RS images and accommodated input images of arbitrary size (Zhu et al. Citation2017). Several semantic segmentation models FCN-based have been constructed to achieve pixel-level image classification, such as UNet (Ronneberger, Fischer, and Brox Citation2015), DeepLab V3 (Chen et al. Citation2018), and SegNet (Badrinarayanan, Handa, and Cipolla Citation2015). As shown in , the skip connections were introduced to fuse information from different depths and recover the fine-grained spatial information lost in the downsamples (Zhu et al. Citation2017). The upsamples and softmax in the last layer of the decoder were used to recover the lower size features in downsamples to the original image size.

Figure 3. Remote sensing image classification process based on encode-decoder architecture.

Figure 3. Remote sensing image classification process based on encode-decoder architecture.

3.2.2. Semantic segmentation networks selection

In this study, we conducted model training and performance evaluation using preprocessed dataset with DeepLab V3+, UNet, and UNet++ networks. illustrates the UNet (Ronneberger, Fischer, and Brox Citation2015) network, which consists of downsampling and upsampling layers. The UNet++ (), as modified by Zhou et al. (Citation2018), incorporated dense skip connections to extract multi-scale feature maps from multi-level convolution pathways. The DeepLab V3+ (Chen et al. Citation2018) () improved the segmentation performance by using the Atrous Spatial Pyramid Pooling (ASPP) to resample features at different scales. Specifically, we appended a combined attention module in the decoder part to obtain fine-grained semantic segmentation improvement. The Spatial and Channel Squeeze & Excitation Block (scSE, ), as described in detail by Roy et al. (Citation2018), enables the learning of more meaningful feature maps that are relevant spatially and channel-wise. Additionally, a backbone pre-trained by the ImageNet dataset and named RegNetY_320 was adopted in the networks to extract the multi-level feature maps.

Figure 4. Different semantic segmentation models. (a) UNet; (b) UNet++; (c) deeplab V3+.

Figure 4. Different semantic segmentation models. (a) UNet; (b) UNet++; (c) deeplab V3+.

Figure 5. Spatial and channel Squeeze and Excitation block(scSE) (modified from Roy et al. (Citation2018)).

Figure 5. Spatial and channel Squeeze and Excitation block(scSE) (modified from Roy et al. (Citation2018)).

3.2.3. Model parameters and training environment

To test the performance of preprocessed images and weak labels, shows the parameters common to all the semantic segmentation networks. We selected the stochastic gradient descent (SGD) as the optimizer to adjust the parameters after each sample and accelerate convergence. In the scheduler, T_0 is the number of epochs for the first restart, and T_mult is the value controlling the speed of the learning rate. For the multiple classes, we adopted the joint loss function of label smoothing cross-entropy loss function (LSCEloss) and dice coefficient loss (DCloss) in reference of Wang et al. (Citation2022b). The smooth factor was set to 0.1 in LSCEloss. The joint loss and mean Intersection over Union (mIoU) values used to evaluate model performance, and the training will be terminated if both values were not improved for 10 epochs. The loss and mIoU are used to evaluate the model performance. Due to the small number, 624 pairs of data set were used for model training and validation. In particular, the random linear stretching of 0.5%, 1%, or 2% was applied to reprocess the input images to evaluate the validation accuracy of the model during the training process. Wang et al. (Citation2022a) pointed out that upsampling of small samples in imbalanced classes can improve the model performance and classification accuracy. Hence, three-fold upsampling methods were used to increase the sample number in the model. In this study, the upsampling method involves generating an augmented dataset by copying the specified categories in the training dataset multiple times before model training. The model training is subsequently conducted using this augmented dataset. All models in the experiments were performed on an Ubuntu workstation with a single NVIDIA Tesla T4 graphics card (16 GB RAM), using the PyTorch and GDAL packages based on the Python platform.

Table 3. Parameters of the semantic segmentation networks.

3.3. Evaluation indicators

The evaluation indicators of prediction classification included the overall accuracy (OA), mean F1 score (F1-score), and Cohen’s kappa coefficient (Kappa) (Diakogiannis et al. Citation2020; Wang et al. Citation2022a; Xu et al. Citation2020). The user’s accuracy (UA) and producer’s accuracy (PA) were used to evaluate the accuracy of crops based on the optimal semantic segmentation model. Finally, the landscape indicators, including patch density (PD) and edge density (ED), were used to evaluate the patch fragmentation of the optimal classification results. The greater the patch density, the more substantial the overall heterogeneity and the higher fragmentation, while the more significant the edge density, the more complex the edge shape of the crop and land use classification (Su et al. Citation2022).

4. Results

4.1. Images and histogram based on improved HM method

The reference, target, and images with improved HM and their histogram results are shown in , respectively. The reference image selection fully considered the terrain, crop phenology, types, and image quality. The reference image is part of the orbit number T49SGT obtained on 16 August 2019. All images are preprocessed for feature reflectivity truncation and histogram matching based on this reference image. Currently, crops such as corn, peanut, and soybean are in a rapid growth stage with high vegetation coverage. In this study, the images of the orbit number T49SGU on 4 September 2020 and the orbit number T49SGT on 31 July 2021 were selected for comparison. It can be seen from that the color of the improved HM image is more consistent with the reference image. In particular, the contrast of the image obtained on 31 July 2021, has been significantly enhanced. The colors between reference image and appear visually similar due to the similarity of vegetation characteristics in the same combined bands after HM processing. The emerald-colored areas in the southern region of are primarily dominated by soybean cultivation. At this stage, soybeans exhibit limited vegetation coverage and are influenced by the soil spectrum, which closely resembles the spectral characteristics of bare land. Consequently, these regions exhibit spectral characteristics of bare land at the false color composite bands, increasing the contrast between vegetation and soil to a certain extent. The improved HM method not only makes the color consistency of Sentinel-2 images, but also improves the visual information of images.

Figure 6. Reference image, target images, and images with improved HM method. (a,c) the target image (a) and improved HM-based result (c) on September 4, 2020. (b,d) the target image (b) and improved HM-based result (d) on July 31, 2021. The selection R/G/B of Sentinel-2 images were NNIR/SWIR1/RE1.

Figure 6. Reference image, target images, and images with improved HM method. (a,c) the target image (a) and improved HM-based result (c) on September 4, 2020. (b,d) the target image (b) and improved HM-based result (d) on July 31, 2021. The selection R/G/B of Sentinel-2 images were NNIR/SWIR1/RE1.

Figure 7. Histogram results of 10 spectral bands of the images shown in . The (a), (b), (c), and (d) correspond to those in .

Figure 7. Histogram results of 10 spectral bands of the images shown in Figure 6. The (a), (b), (c), and (d) correspond to those in Figure 6(a-d).

In addition, shows the histogram comparison of 10 spectral bands of the original image and the preprocessed image. shows that the original histogram results of RE3, NIR, and NNIR exhibit significant differences. The reason for this problem is not only the difference in the original image acquisition date but also the difference in the ground-object types and pixel numbers. The histogram of the images based on the reference image and the improved HM has remarkable similarity (). Compared with the original histogram results, the reflectivity curves of B, G, R, RE1, and SWIR2 bands are more concentrated, while those of RE2, RE3, NIR, NNIR, and SWIR2 bands vary with two peaks. Therefore, the improved HM method made the features of multi-temporal images more consistent.

4.2. Model performance evaluation

The training dataset of 624 pairs was increased to 1905 pairs with three-fold upsampling of tobacco and non-cultivated land. The epochs, joint loss, and mIoU results of different schemes and semantic segmentation networks based on weak training datasets are shown in . The epochs of all models are very close within 70–72. The training data set processed based on the improved HM method can obtain lower loss and higher mIoU values for the same feature scheme and network. The ~ 61st epoch obtained the optimal performance of different networks. From the result curves of both feature schemes, the curve changes of DeepLab V3+ are the closest. The curve changes of UNet are the largest and show more distinct, while the UNet++ model achieved the optimal performance. Therefore, affected by the sample accuracy, the loss values of all training models are still high. Improving the accuracy and quality of samples can enhance the model performance (Wang et al. Citation2022b).

Figure 8. Training results of both feature schemes with and without improved HM method. The position of the gray line represents the optimal loss and mIou values of the training model.

Figure 8. Training results of both feature schemes with and without improved HM method. The position of the gray line represents the optimal loss and mIou values of the training model.

The total time (TT, min), total epoch (TE), optimal loss value (OL), optimal mIoU value (OM), and average time of each epoch (AT, min) of different schemes and models are listed in . For the same network, scheme 2 has a lower loss value and a higher mIoU value than those of scheme 1. Therefore, feature selection can enhance the training model performance. In addition, the UNet model requires the shortest total and average time to accomplish the training process, while the UNet++ model requires the longest time. The UNet++ model of scheme 2 using the improved HM method obtains the lowest loss value and highest mIoU value, with values of 0.457 and 0.760, respectively, which are 0.009 lower and 0.017 higher than those without the improved HM method. Consequently, the improved HM method not only reduces the feature divergence of Sentinel-2 images but also provides better model performance under the same feature scheme.

Table 4. Parameter results of different schemes and networks. The total time, total epoch, optimal loss value, optimal mIou value, and average time of each epoch are abbreviated as TT, TE, OL, OM, and AT, respectively, and the unit of TT and at is min.

4.3. Classification results of the optimal semantic segmentation model

shows the classification results and accuracies of crops and other categories from 2019 to 2022 based on the UNet++ model. The OA values based on UNet and DeepLab V3+ models of both schemes with or without improved HM method are lower than 82.50%. Building on previous work (Adrian, Sagan, and Maimaitijiang Citation2021; Wang et al. Citation2022b, Citation2022a; Yuan, Shi, and Gu Citation2021), higher classification accuracy can be achieved based on better model performance as listed in . Without training samples, the UNet++ model of scheme 2 with an improved HM method yields the optimal prediction classification accuracies within a specific and adjacent area. The predicted crop classification time for the entire study area from 2019 to 2022 is 8.62 minutes. The predicted classification accuracies demonstrate the stronger spatiotemporal generalizability of the UNet++ model based on feature selection and improved HM method. The success of prediction classification based on semantic segmentation models can be attributed to the multi-level feature maps-generating strategy in multi-feature RS imagery (Diakogiannis et al. Citation2020; Paris and Bruzzone Citation2019; Zhou et al. Citation2018).

Figure 9. Prediction classification results and evaluation indicators of both feature schemes with and without improved HM method based on the UNet++ model.

Figure 9. Prediction classification results and evaluation indicators of both feature schemes with and without improved HM method based on the UNet++ model.

It can be seen that the results of scheme 1 without the improved HM method in 2022 show significant misclassifications, especially for forest land. The model with improved HM method can obtain higher OA, Kappa, and F1-score values based on the same scheme. Moreover, in addition to the F1-score in 2021, the classification accuracy values of scheme 2 without improved HM method exceeds the values of scheme 1 with improved HM method in 2021 and 2022. For the UNet++ model, the loss value of scheme 2 without the improved HM method is 0.002 higher than that of scheme 1 with the improved HM method, while the mIoU value of scheme 2 without the improved HM method is 0.002 higher than that of scheme 1 with improved HM method. As a result, the prediction classification accuracy of models with similar performance has advantages and disadvantages. However, according to the F1-score values from 2019 to 2022, scheme 2 without the improved HM method achieves better classification results than scheme 1 with the improved HM method. These results indicate that the vegetation indices and texture features can effectively improve the model performance and enhance the prediction classification accuracy. In addition, the OA, Kappa, and F1-score values of scheme 2 with improved HM method based on the UNet++ model are higher than 87%, 82%, and 65%, respectively. Consequently, the proposed method in this study effectively improves the model performance and prediction classification accuracy.

4.4. PA and UA of crops based on the optimal semantic segmentation model

In addition to OA, Kappa, and F1-score values, the UA and PA indicators were used to assess the accuracy of different categories further. We analyzed the PA, UA, PD, and ED values of the optimal model classification results, i.e. the prediction classification results of scheme 2 with the improved HM method of the UNet++ model. As listed in , the classification precision values of soybean in crops are relatively low. The values of non-cultivated land, forest land, and urban are lower. On the one hand, the verification sample pixels of these categories are fewer. On the other hand, the width of the field path is generally about 2 m, and it is difficult to effectively recognize due to the influence of high-coverage vegetation such as corn. However, their distribution information would be recorded during the ground survey, leading to low PA and UA values.

Table 5. PA and UA results of scheme 2 with improved HM method based on the UNet++ model.

Similarly, individuals or rows in the field (such as poplar) would not be recorded during the ground survey, while the trees’ shading of the field made the crops more likely to be mistaken for forest land. In addition, due to the influence of image resolution, the image pixels in interplanting mode are usually classified as the crops with higher coverage. For example, corn and soybean interplanting will be classified as corn in the late growth stage. Hence, the combination of ground sample results and mixed pixels of satellite imagery leads to the low classification accuracy of minor ground objects.

The PD and ED values of the prediction classification from 2019 to 2022 in indicated that the heterogeneity and fragmentation of the classification landscape showed an increasing trend year by year, while the edge shape values showed a downward trend in 2022. As shown in , in addition to the classification in 2022, the spatial distribution of soybean and other crops was more concentrated. The higher ED value in 2021 showed the more contiguous agricultural landscape. As listed in , the corn, soybean, other crops, and urban had higher PD and ED values, indicating more scattered distribution and complex edge shapes. Due to the concentration of planting methods, the classification accuracy of peanuts and tobacco was higher than that of soybean. Therefore, for minor crops, their classification accuracy is greatly affected by spatial distribution.

5. Discussion

5.1. Contrast profiles of original images and improved HM images

show the visualization and histogram results of original images and improved HM images in 2020 and 2021. As described by Cao et al. (Citation2015), Cui et al. (Citation2017), and Helmer and Ruefenacht (Citation2006), the mean value reflects the brightness of the image, and the higher the value, the greater the brightness of the image. Variance reflects the dispersion of gray levels of each pixel in the image relative to the average value of gray levels, and is used to evaluate the amount of image information. The mean gradient reflects the ability to express the contrast of minute details in the image, and the larger the value, the higher the image’s sharpness. In this paper, two multispectral remote sensing image pairs were selected to test the improved HM performance.

shows the change curves of mean, variance, and mean gradient of 10 spectral bands before and after improved HM processing. Compared with the contrast profiles of original images, the metric values of the four spectral bands, including RE2, RE3, NIR, and NNIR have been significantly improved, with little changes in other spectral bands. In particular, the mean and variance of SWIR2 in are even decreased. Several existing studies have shown that the red-edge, narrow near-infrared, and near-infrared spectral bands of Sentinel-2 images played a more critical role in complex crop classification (Chaves, Picoli, and Sanches Citation2020; Portales-Julia et al. Citation2021; Song et al. Citation2021). Combining the performance and the prediction classification accuracy, the UNet++ model based on feature selection and improved HM method can be applied to complex agricultural areas with weak samples.

Figure 10. Contrast profiles of original images and improved HM images. (a) the original image was obtained on September 4, 2020, and its improved HM-based image. (b) the original image was obtained on July 31, 2021, and its improved HM-based image. The images correspond to those in .

Figure 10. Contrast profiles of original images and improved HM images. (a) the original image was obtained on September 4, 2020, and its improved HM-based image. (b) the original image was obtained on July 31, 2021, and its improved HM-based image. The images correspond to those in Figure 6(a-d).

5.2. Analysis of feature selection and model efficiency

The semantic segmentation networks adopt an encoder-decoder architecture to achieve N-to-N classification tasks in a complex agricultural area. This improved HM method and the UNet++ model with scSE module demonstrate superiority with respect to other models (). Training and prediction time are often used to evaluate a network. In addition to the total time and average epoch time in , shows the total prediction time of three networks for each scheme with or without the improved HM method. For the same prediction data set, an increased feature number of satellite images does not lead to a significant increase in the prediction time; even some models require a shorter time, such as the UNet and UNet++ models. This result is different from the time required for model training. Although the increased features lead to a longer time for model training, shows an acceptable training time in RS based on a few weak samples. Compared with the experimental results of Wang et al. (Citation2022a) and Li et al. (Citation2021), weak samples reduce the labeling workload, but leads to a higher loss value of the training model. The higher loss value indicates that the problems of imbalanced class and low-quality samples need to be further improved (Yuan, Shi, and Gu Citation2021). The superior trade-off between prediction classification accuracy and regional model efficiency proposed in this study brings advantages, such as the potential to obtain near-real-time satellite images for cropping and land use monitoring.

Figure 11. Total prediction time of three networks for each scheme with or without improved HM method from 2019 to 2022.

Figure 11. Total prediction time of three networks for each scheme with or without improved HM method from 2019 to 2022.

5.3. Representative error analysis of weak samples and classification results

and show the classification accuracies of land use and crop types. The influencing factors of the lower accuracy of soybean, non-cultivated land, forest land and urban need to be further compared and analyzed. As mentioned by Wang et al. (Citation2022a) and Xu et al. (Citation2020), the factors such as errors in the training data set, differences in crop phenology, and topographic conditions influenced the model performance and the prediction accuracy in large-scale crop classification. In addition, Paris and Bruzzone (Citation2021) demonstrated the effectiveness of weak labels derived from thematic products in crop and land-use classification task, while generating large weak labels from RS products remained a challenge. Perantoni and Bruzzone (Citation2022) proposed a transition matrix to deal with the weak labels and suggested using multiple sources of weak labeled data for deep learning model training in RS image scene classification. These experimental results demonstrated the applicability of improved weakly labeled samples in RS classification. However, in addition to testing the applicability of weak samples across time and space, further analysis and processing of representative errors in weak samples can further improve the model’s generalization ability and achieve higher classification accuracy.

As shown in , the prediction classification results in different sample plots from 2019 to 2021 show that the field roads (about 2 m wide) cannot be effectively recognized and extracted from the 10 m-resolution satellite images. The soybeans in are misclassified as corn due to the influence of corn plant-height (more than 2 meters) in the later growth stage (4 September 2020). Therefore, the accuracy of N-to-N crop classification is not only affected by the imbalanced classes, model parameters, image resolution, feature selection, and sample quality, but also by the factors such as agricultural mode, crop phenology and types, fragmentation degree of fields, and the consistency of image features. mainly shows the classification results of urban and water categories in other geographical areas. The classification results of urban and water information based on scheme 2 and the optimal semantic segmentation model are also satisfactory. Notably, single houses with one or a few pixels can be effectively identified, mainly because of the significant spectral characteristics and the less influence by other ground objects. Similarly, water is also extracted with high classification accuracy and finer edge information. Due to the more concentrated planting mode and simpler shapes, the classification accuracies of peanuts, tobacco, and water were higher, and the PD and ED values of them were also lower. Consequently, it is necessary to improve the weak samples further and extract finer road information from high-resolution images to reduce the representative error on the semantic segmentation models and classification accuracy in complex agricultural areas.

Figure 12. Samples and prediction classification results of scheme 2 with improved HM method based on UNet++ model. (a) sample plot (S1) and classification result in 2019; (b) sample plot (S4) and classification result in 2020; (c) sample plot (S1) and classification result in 2021; (d) sample plot (S2) and classification result in 2022; (e) the crop and urban classification in other subregions in 2019; (f) the crop and river classification in other subregions in 2020.

Figure 12. Samples and prediction classification results of scheme 2 with improved HM method based on UNet++ model. (a) sample plot (S1) and classification result in 2019; (b) sample plot (S4) and classification result in 2020; (c) sample plot (S1) and classification result in 2021; (d) sample plot (S2) and classification result in 2022; (e) the crop and urban classification in other subregions in 2019; (f) the crop and river classification in other subregions in 2020.

5.4. Assessment of the model-based transferable learning

In this experiment, we evaluated the crop classification results and accuracy of different deep-learning models in the same city and an adjacent city for multiple years without samples. On the one hand, the trained semantic segmentation model can accomplish image-based crop classification without samples, reducing the dependence on training samples in traditional supervised classification methods and improving the generalization ability of crop classification models. On the other hand, the model can quickly and accurately obtain multi-year crop classification results. The predicted classification time (8.62 minutes) and classification accuracy () of Sentinel-2 images from 2019 to 2022 in the entire study area demonstrate advantages in reducing manual interpretation workload. This study mainly constructed a regional crop semantic segmentation model according to Tobler’s First Law of Geography. The model performed generalization ability in adjacent regions with similar crop types and phenological periods.

The UNet++ networks of scheme 2 with improved method performed best in the entire study area, with OA, Kappa, and F1-score exceeding 87%, 82%, and 65%, respectively. In the domestic transfer cases, this model made satisfactory prediction results with insignificant gaps in accuracy. However, it should be further observed that the performance of models in more significant regions or other countries with similar crop types and conditions. For transfer across time, we used the weak samples in 2019 and 2020 to train the models and predict the classification results from 2019 to 2022. Specifically, these semantic segmentation models were transferred to more years with different classification accuracy, as shown in . The UNet++ model performed well in cross-year transfer, indicating that the improved HM method enhanced the contrast and consistency of multi-temporal images, and thus the effect of differences in crop phenology could be alleviated. Without training samples, the model’s transferability led to satisfactory performance in the study area across regions and years based on the weak samples. In this study, the source and target domains were selected from latitudes in close proximity with similar growth phenology. Despite advancements made, certain limitations persist in this study, pertaining to the geographic area and target domain categories. These limitations offer promising avenues for future enhancements. The construction of geographical zones also serves the purpose of mitigating the accumulation and propagation of representative errors across distinct regional models, especially for minor crops planted regionally.

6. Conclusions

RS-based accurate crop mapping is critical for monitoring agricultural practices and food production. However, due to the seasonal nature of crops, previous samples and traditional machine learning methods generated within a specific domain often lose their validity across years and regions. In this study, we proposed an improved HM method to alleviate the difference between multi-temporal images and the negative impact of domain shift, thus enhancing the transferability for efficient semantic segmentation of remotely sensed crop and land use images. Specifically, the dataset used for model training involved the multi-feature images and the weak labels extracted from Sentinel-2 imagery. Then, three semantic segmentation networks, DeepLab V3+, UNet, and UNet++, were selected to evaluate the model performance and transferability in adjacent regions without samples. It was found that the improved HM method can enhance the contrast of multi-temporal images and model performance of different feature selection schemes. Coupling spectral, vegetation indices, and texture features, the UNet++ model based on an improved HM method and weak samples had better performance than two other models and higher classification accuracy in transfer cases, with OA, Kappa, and F1-score exceeding 87%, 82%, and 65%, respectively.

Moreover, the representative error of weak samples and prediction classification results were analyzed to improve the model’s robustness further. The method proposed in this study provides an efficient solution for crop and land use mapping in label-missing regions and years. In future research, we will continue to explore the potential and spatiotemporal generalization ability of semantic segmentation networks for geospatial vision tasks.

CRediT authorship contribution statement

Lijun Wang: Conceptualization, methodology, writing, reviewing, and editing. Yang Bai: Investigation, visualization, and editing. Jiayao Wang: Conceptualization, supervision, project administration, and funding acquisition. Zheng Zhou: Editing. Fen Qin: Conceptualization and supervision. Jiyuan Hu: Investigation and editing.

Acknowledgments

We sincerely thank the anonymous reviewers for their constructive comments and insightful suggestions, which greatly improved the quality of this manuscript. We also acknowledge the support from the Henan Dabieshan National Field Observation and Research Station of Forest Ecosystem.

Disclosure statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability statement

The code is shared at https://github.com/AgriRS/Crop_DL_netwoks.git.

Additional information

Funding

This study was supported by the National Natural Science Foundation of China [grant number U21A2014], National Science and Technology Platform Construction [grant number 2005DKA32300], High Resolution Satellite Project of the State Administration of Science [grant number 80Y50G19-9001-22/23], Key Laboratory of Geospatial Technology for Middle and Lower Yellow River Regions (Henan University), Ministry of Education [grant number GTYR202203], and the Science and Technology Development Program of Henan Province [Grant number 232102321032]. They also acknowledge the support from the Henan Dabieshan National Field Observation and Research Station of Forest Ecosystem.

References

  • Adrian, J., V. Sagan, and M. Maimaitijiang. 2021. “Sentinel SAR-Optical Fusion for Crop Type Mapping Using Deep Learning and Google Earth Engine.” Isprs Journal of Photogrammetry & Remote Sensing 175:215–21. https://doi.org/10.1016/j.isprsjprs.2021.02.018.
  • Badrinarayanan, V., A. Handa, and R. Cipolla. 2015. “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling.” Computer Science arXiv:1505.07293.https://doi.org/10.48550/arXiv.1505.07293.
  • Cao, B. A., Y. M. Du, D. Q. Xu, H. Li, and Q. H. Liu. 2015. “An Improved Histogram Matching Algorithm for the Removal of Striping Noise in Optical Remote Sensing Imagery.” Optik 126 (23): 4723–4730. https://doi.org/10.1016/j.ijleo.2015.08.079.
  • Chaves, M. E. D., M. C. A. Picoli, and I. D. Sanches. 2020. “Recent Applications of Landsat 8/OLI and Sentinel-2/MSI for Land Use and Land Cover Mapping: A Systematic Review.” Remote Sensing 12 (18): 3062. https://doi.org/10.3390/rs12183062.
  • Chen, L. C., Y. K. Zhu, G. Papandreou, F. Schroff, and H. Adam. 2018. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.” Proceedings of the European conference on computer vision (ECCV) 11211: 833–851. https://doi.org/10.48550/arXiv.1802.02611.
  • Cui, H., L. Zhang, H. B. Ai, B. Xu, and Z. H. Wang. 2017. “Large Collection Satellite Images Color Normalization Algorithm Based on Tone Reference Map.” Acta Geodaetica et Cartographica Sinica 46 (12): 1986–1997. https://doi.org/10.11947/j.AGCS.2017.20170402.
  • Diakogiannis, F. I., F. Waldner, P. Caccetta, and C. Wu. 2020. “ResUnet-A: A Deep Learning Framework for Semantic Segmentation of Remotely Sensed Data.” Isprs Journal of Photogrammetry & Remote Sensing 162: 94–114. https://doi.org/10.1016/j.isprsjprs.2020.01.013.
  • Hao, P. Y., L. P. Di, C. Zhang, and L. Y. Guo. 2020. “Transfer Learning for Crop Classification with Cropland Data Layer Data (CDL) as Training Samples.” Science of the Total Environment 733: 138869. https://doi.org/10.1016/j.scitotenv.2020.138869.
  • Helmer, E. H., and B. Ruefenacht. 2006. “Cloud-Free Satellite Image Mosaics with Regression Trees and Histogram Matching.” Photogrammetric Engineering & Remote Sensing 71 (9): 1079–1089. https://doi.org/10.14358/PERS.71.9.1079.
  • Horn, K. P. B., and R. J. Woodham. 1979. “Destriping LANDSAT MSS Images by Histogram Modification.” Computer Graphics and Image Processing 10 (1): 69–83. https://doi.org/10.1016/0146-664X(79)90035-2.
  • Ienco, D., R. Interdonato, R. Gaetano, and D. H. T. Minh. 2019. “Combining Sentinel-1 and Sentinel-2 Satellite Image Time Series for Land Cover Mapping via a Multi-Source Deep Learning Architecture.” Isprs Journal of Photogrammetry & Remote Sensing 158: 11–22. https://doi.org/10.1016/j.isprsjprs.2019.09.016.
  • Izquierdo-Verdiguier, E., and R. Zurita-Milla. 2020. “An Evaluation of Guided Regularized Random Forest for Classification and Regression Tasks in Remote Sensing.” International Journal of Applied Earth Observation and Geoinformation 88: 102051. https://doi.org/10.1016/j.jag.2020.102051.
  • Jayasankari, S., and S. Domnic. 2020. “Histogram Shape Based Gaussian Sub-Histogram Specification for Contrast Enhancement.” Intelligent Decision Technologies-Netherlands 14 (1): 67–80. https://doi.org/10.3233/IDT-190081.
  • Kaiser, P., J. D. Wegner, A. Lucchi, M. Jaggi, T. Hofmann, and K. Schindler. 2017. “Learning Aerial Image Segmentation from Online Maps.” IEEE Transactions on Geoscience & Remote Sensing 55 (11): 6054–6068. https://doi.org/10.1109/TGRS.2017.2719738.
  • Khan, A. H., M. M. Fraz, and M. Shahzad. 2021. “Deep Learning Based Land Cover and Crop Type Classification: A Comparative Study.” 2021 International Conference on Digital Futures and Transformative Technologies (Icodt2), Islamabad, Pakistan, 1–6. https://doi.org/10.1109/ICoDT252288.2021.9441483.
  • Li, Y. S., W. Chen, Y. J. Zhang, C. Tao, R. Xiao, and Y. H. Tan. 2020. “Accurate Cloud Detection in High-Resolution Remote Sensing Imagery by Weakly Supervised Deep Learning.” Remote Sensing of Environment 250: 112045. https://doi.org/10.1016/j.rse.2020.112045.
  • Li, W. B., K. M. Sun, Z. T. Du, X. Q. Hu, W. Z. Li, J. J. Wei, and S. Gao. 2021. “PCNet: Cloud Detection in FY-3D True-Color Imagery Using Multi-Scale Pyramid Contextual Information.” Remote Sensing 13 (18): 3670. https://doi.org/10.3390/rs13183670.
  • Li, Z. H., H. Y. Zhang, F. X. Lu, R. Y. Xue, G. Y. Yang, and L. P. Zhang. 2022. “Breaking the Resolution Barrier: A Low-To-High Network for Large-Scale High-Resolution Land-Cover Mapping Using Low-Resolution Labels.” Isprs Journal of Photogrammetry & Remote Sensing 192: 244–267. https://doi.org/10.1016/j.isprsjprs.2022.08.008.
  • Martinez, J. A. C., L. E. C. La Rosa, R. Q. Feitosa, I. D. Sanches, and P. N. Happ. 2021. “Fully Convolutional Recurrent Networks for Multidate Crop Recognition from Multitemporal Image Sequences.” Isprs Journal of Photogrammetry & Remote Sensing 171: 188–201. https://doi.org/10.1016/j.isprsjprs.2020.11.007.
  • Paris, C., and L. Bruzzone. 2019. “Automatic Extraction of Weak Labeled Samples from Existing Thematic Products for Training Convolutional Neural Networks.” 2019 Ieee International Geoscience and Remote Sensing Symposium (Igarss 2019), Yokohama, Japan, 5722–5725. https://doi.org/10.1109/IGARSS.2019.8900649.
  • Paris, C., and L. Bruzzone. 2021. “A Novel Approach to the Unsupervised Extraction of Reliable Training Samples from Thematic Products.” IEEE Transactions on Geoscience & Remote Sensing 59 (3): 1930–1948. https://doi.org/10.1109/TGRS.2020.3001004.
  • Perantoni, G., and L. Bruzzone. 2022. “A Novel Technique for Robust Training of Deep Networks with Multisource Weak Labeled Remote Sensing Data.” IEEE Transactions on Geoscience & Remote Sensing 60: 1–15. https://doi.org/10.1109/TGRS.2021.3091482.
  • Portales-Julia, E., M. Campos-Taberner, F. J. Garcia-Haro, and M. A. Gilabert. 2021. “Assessing the Sentinel-2 Capabilities to Identify Abandoned Crops Using Deep Learning.” Agronomy-Basel 11 (4): 654. https://doi.org/10.3390/agronomy11040654.
  • Qi, X. Q., K. Q. Li, P. K. Liu, X. G. Zhou, and M. Y. Sun. 2020. “Deep Attention and Multi-Scale Networks for Accurate Remote Sensing Image Segmentation.” Institute of Electrical and Electronics Engineers Access 8: 146627–146639. https://doi.org/10.1109/ACCESS.2020.3015587.
  • Ronneberger, O., P. Fischer, and T. Brox. 2015. “U-Net: Convolutional Networks for Biomedical Image Segmentation.” Medical Image Computing and Computer-Assisted Intervention, Pt Iii 9351: 234–241. https://doi.org/10.1007/978-3-319-24574-4_28.
  • Roy, A. G., N. Navab, and C. Wachinger. 2018. “Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks.” Medical Image Computing and Computer Assisted Intervention - Miccai 2018, Pt I 11070: 421–429. https://doi.org/10.1007/978-3-030-00928-1_48.
  • Song, X. P., W. L. Huang, M. C. Hansen, and P. Potapov. 2021. “An Evaluation of Landsat, Sentinel-2, Sentinel-1 and MODIS Data for Crop Type Mapping.” Science of Remote Sensing 3: 100018. https://doi.org/10.1016/j.srs.2021.100018.
  • Su, N. E., S. Jarvie, Y. Z. Yan, X. Q. Gong, F. S. Li, P. Han, and Q. Zhang. 2022. “Landscape context determines soil fungal diversity in a fragmented habitat.” Catena 213: 106163. https://doi.org/10.1016/j.catena.2022.106163.
  • Waldner, F., and F. I. Diakogiannis. 2020. “Deep Learning on Edge: Extracting Field Boundaries from Satellite Images with a Convolutional Neural Network.” Remote Sensing of Environment 245: 111741. https://doi.org/10.1016/j.rse.2020.111741.
  • Wang, Y. M., L. W. Feng, W. W. Sun, Z. Zhang, H. Y. Zhang, G. Yang, and X. C. Meng. 2022. “Exploring the Potential of Multi-Source Unsupervised Domain Adaptation in Crop Mapping Using Sentinel-2 Images.” GIScience & Remote Sensing 59 (1):2247–2265. https://doi.org/10.1080/15481603.2022.2156123.
  • Wang, L. J., J. Y. Wang, Z. Z. Liu, J. Zhu, and F. Qin. 2022b. “Evaluation of a Deep-Learning Model for Multispectral Remote Sensing of Land Use and Crop Classification.” The Crop Journal 10 (5): 1435–1451. https://doi.org/10.1016/j.cj.2022.01.009.
  • Wang, L. J., J. Y. Wang, X. W. Zhang, L. G. Wang, and F. Qin. 2022a. “Deep Segmentation and Classification of Complex Crops Using Multi-Feature Satellite Imagery.” Computers and Electronics in Agriculture 200: 107249. https://doi.org/10.1016/j.compag.2022.107249.
  • Wang, Y. M., Z. Zhang, L. W. Feng, Y. C. Ma, and Q. Y. Du. 2021. “A New Attention-Based CNN Approach for Crop Mapping Using Time Series Sentinel-2 Images.” Computers and Electronics in Agriculture 184: 106090. https://doi.org/10.1016/j.compag.2021.106090.
  • Xu, Q., J. S. Zhang, F. Zhang, S. Ge, Z. Yang, and Y. M. Duan. 2022. “Applicability of Weak Samples to Deep Learning Crop Classification.” National Remote Sensing Bulletin 26 (7): 1395–1409. https://doi.org/10.11834/jrs.20221127.
  • Xu, J. F., Y. Zhu, R. H. Zhong, Z. X. Lin, J. L. Xu, H. Jiang, J. F. Huang, H. F. Li, and T. Lin. 2020. “DeepCropmapping: A Multi-Temporal Deep Learning Approach with Improved Spatial Generalizability for Dynamic Corn and Soybean Mapping.” Remote Sensing of Environment 247: 111946. https://doi.org/10.1016/j.rse.2020.111946.
  • Yuan, X. H., J. F. Shi, and L. C. Gu. 2021. “A Review of Deep Learning Methods for Semantic Segmentation of Remote Sensing Imagery.” Expert Systems with Applications 169: 114417. https://doi.org/10.1016/j.eswa.2020.114417.
  • Zhang, J. Z., S. J. Cong, G. Zhang, Y. J. Ma, Y. Zhang, and J. P. Huang. 2022. “Detecting Pest-Infested Forest Damage Through Multispectral Satellite Imagery and Improved UNet Plus.” Sensors 22 (19): 7440. https://doi.org/10.3390/s22197440.
  • Zhou, Z. W., M. M. R. Siddiquee, N. Tajbakhsh, and J. M. Liang. 2018. “UNet Plus Plus: A Nested U-Net Architecture for Medical Image Segmentation.” Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Dlmia 2018 11045: 3–11. https://doi.org/10.1007/978-3-030-00889-5_1.
  • Zhu, Q. Q., Y. Lei, X. L. Sun, Q. F. Guan, Y. F. Zhong, L. P. Zhang, and D. R. Li. 2022. “Knowledge-Guided Land Pattern Depiction for Urban Land Use Mapping: A Case Study of Chinese Cities.” Remote Sensing of Environment 272: 112916. https://doi.org/10.1016/j.rse.2022.112916.
  • Zhu, X. X., D. Tuia, L. C. Mou, G. S. Xia, L. P. Zhang, F. Xu, and F. Fraundorfer. 2017. “Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources.” IEEE Geoscience and Remote Sensing Magazine 5 (4): 8–36. https://doi.org/10.1109/MGRS.2017.2762307.