1,572
Views
2
CrossRef citations to date
0
Altmetric
Research Article

A shape-attention Pivot-Net for identifying central pivot irrigation systems from satellite images using a cloud computing platform: an application in the contiguous US

ORCID Icon, ORCID Icon, , , , , , , , & show all
Article: 2165256 | Received 01 Sep 2022, Accepted 31 Dec 2022, Published online: 19 Jan 2023

ABSTRACT

Forty percent of global food production relies upon irrigation, which accounts for 70% of total global freshwater use. Thus, the mapping of cropland irrigation plays a significant role in agricultural water management and estimating food production. However, current spaceborne irrigated cropland mapping is highly reliant upon its spectral behavior, which often has high uncertainty and lacks information about the method of irrigation. Deep learning (DL) allows for the classification of irrigated cropland according to unique spatial patterns, such as the central pivot irrigation system (CPIS). But convolutional neural networks (CNNs) are usually biased toward color and texture features, a spatial transferable and accurate CPIS identification model is lacking owing to previous model seldom involves the round shapes of CPIS, which is usually key to distinguishing CPIS. To address this lack, we proposed a shape attention neural network by integrating spatial-attention gate, residual block, and multi-task learning, Pivot-Net, to incorporate shape information identify CPIS in satellite imagery. Specifically, we employed CPIS in Kansas to train our model using Sentinel-2 and Landsat-8 optical images. We found that Pivot-Net is superior to seven state-of-the-art semantic segmentation models on second-stage validation. We also evaluated the performance of Pivot-Net at three validation sites, which had an average F1-score and mean IOU of 90.68% and 90.45%, respectively, which further demonstrated the high accuracy of the proposed model. Moreover, to show that the proposed Pivot-Net can map CPIS at the country scale, we generated the first CPIS map at 30 m for the contiguous US using a cloud computing platform and our Pivot-Net model. The total CPIS area for the contiguous US was 61,094 km2 in 2018, which comprised 26.22% of all irrigated lands. Our results can be accessed at https://tianfyou.users.earthengine.app/view/cpisus. Therefore, the proposed shape-attention Pivot-Net demonstrates the ability to classify CPIS at large spatial scales and are feasible to map CPIS at national scales.

1 Introduction

Irrigation is a prevailing agricultural practice that increases crop yields and mitigates the impacts of drought on agricultural crops (Dheeravath et al. Citation2010; Ozdogan et al. Citation2010). Globally, only 20% of cropland is irrigated, but this small fraction accounts for 40% of global food production (Brocca et al. Citation2018; Salmon et al. Citation2015; Droogers, Immerzeel, and Lorite Citation2010). Also, more than 70% of freshwater use is for agricultural irrigation (Chen et al. Citation2018; Salmon et al. Citation2015). Therefore, the mapping of irrigated cropland and irrigation types is important for agricultural water management and crop production estimation. Previous studies made a great effort in the mapping of irrigated cropland using the spectral information of satellite data and related environmental features, such as the irrigated cropland mapping in the US (Deines et al. Citation2019; Xie et al. Citation2019), China (Zhang et al. Citation2022), and Central Asia (Ragettli, Herberz, and Siegfried Citation2018). However, the accuracy of irrigated cropland by these methods depends on the representativeness of training samples (Wu et al. Citation2022), and it cannot identify the irrigation ways (de Albuquerque et al. Citation2020).

Sprinkler irrigation is a popular irrigation manner that is advocated as water-saving by many local governments due to its high water use efficiency (Howell Citation2003; Chen et al. Citation2019). Center pivot irrigation systems (CPIS) are the most popular sprinkler irrigation systems in developed countries because of their high water use efficiency, uniformity, and adaptability to various terrains (Waller et al. Citation2016). The identification of CPIS can benefit the estimation of agricultural water consumption and food production (Li, Johansen, and McCabe Citation2022). However, a universal, generalized model for identifying CPIS is lacking. The Center Pivot Irrigation Systems (CPIS) can be visually interpreted from satellite images when it was sparsely distributed in a small area due to its round shape (Johansen et al. Citation2021; Chen et al. Citation2019). However, it is a time-consuming and labor-intensive task to distinguish CPIS manually through visual interpretation when it was densely distributed over a large area even at a national scale.

Deep learning (DL) models with convolutional neural networks (CNNs) are able to extract hierarchical features (Waldner and Diakogiannis Citation2020; Zhou et al. Citation2020; Hinton and Salakhutdinov Citation2006), including low-level features like texture, spectral, and edge features, as well as high-level features like spatial context information and shapes (Sun et al. Citation2020). It is possible to identify specific irrigated types, such as CPIS, using a DL model. With the support of U-Net, an improved architecture of FCN, several researchers were able to identify a central pivot system in Brazil (de Albuquerque et al. Citation2020; Saraiva et al. Citation2020). Using satellite images from PlanetScope images, Saraiva proposed to identify CPIS with the U-Net model and achieved a high-accuracy result (Saraiva et al. Citation2020). Albuquerque compared the performance of three popular frameworks: U-Net (Falk et al. Citation2019), Deep ResUnet (Zhang, Liu, and Wang Citation2018), and SharpMask (Pinheiro et al. Citation2016) and found that U-Net had state-of-the-art results in CPIS identification (de Albuquerque et al. Citation2020). However, recent studies have shown that convolutional neural networks (CNNs) typically focus on the color and texture features of the target object (Geirhos et al. Citation2018), ignoring the shape information of the target object, resulting in low accuracy and weak generalization when applied to other regions.

The inclusion of shape information can improve the accuracy and robustness of the CNN framework in target recognition (Geirhos et al. Citation2018). There are several ways to increase the shape weights in CNN models. A common approach is to add edge information as input to the input model. Tang et al. (2021) used the Canny edge detector to extract edge information and integrate it into the CNN for the CPIS recognition task, but the improvement of this method on the recognition of CPIS is limited because the CNN can extract edge information by itself. Adding different image styles to the training data to reduce the weight of color and texture is an another way to enhance the shape weight in target recognition (Shi et al. Citation2020; Mummadi et al. Citation2021). This way is more effective for style transferring. Adding secondary shape information flow to induce the model to learn more shape-related information is an alternative way to enhance target identification (Sun et al. Citation2020; Takikawa et al. Citation2019). Sun et al. (Citation2020) proposed a shape-attentive U-Net for medical images segmentation and achieved state-of-the-art results for two public medical image segmentation datasets. Takikawa et al. (Citation2019) designed gated shape CNNs for segmentation tasks and also achieved the highest accuracy in the Cityscapes dataset. The inclusion of this shape information can improve the recognition precision of targets, but inevitably increases the computational complexity and reduces the generality of the model.

Inspired by the process of the human understanding of one image, an attention mechanism was recently proposed (Vaswani et al. Citation2017). Rather than processing the whole image at once, a human being focuses on the important parts of one image after a glimpse. Recently, a Convolutional Block Attention Module (CBAM) was proposed by integrating channel-wise and spatial attention (Woo et al. Citation2018). Wang et al. (Citation2017) proposed a residual attention network, which achieves state-of-the-art performance in object recognition and shows robustness against noisy labels. Oktay et al. (2018) designed an attention U-Net to segment the pancreas in medical images. For both types of research, boundary prediction was an additional task when performing segmentation tasks. The attention mechanism can help teach the model to focus on the shape information. Here, we introduce a shape-attention neural network named Pivot-Net to identify CPIS in satellite imagery using spatial attention mechanisms and a multi-task learning method, which is a new attempt to incorporate shape information into CPIS without increasing more parameters.

After the emergence of fully convolutional neural networks (FCN) in 2015 (Long, Shelhamer, and Darrell Citation2015), the semantic segmentation model has been widely applied to identify roads (Zhou et al. Citation2020), detect clouds (Li et al. Citation2019; Jeppesen et al. Citation2019; Li et al. Citation2020), determine field boundaries (Waldner and Diakogiannis Citation2020), build segmentation (Yi et al. Citation2019; Zhao et al. Citation2018), and classify burn area (Zhang et al. Citation2021; Pinto et al. Citation2020). However, the application of DL models in satellite image land cover classification at large scales, such as national scales, is challenging due to high computing costs and a lack of a general model and a practical framework.

The emergence of cloud computing platforms, such as the Google Earth Engine (GEE), Amazon Web Services (AWS), and Alibaba Cloud, has accelerated the classification of high-resolution satellite images of Earth’s surface into thematic maps due to their high computation capacity. Over 40 petabytes of publicly available data, including Landsat-8 and Sentinel-2 datasets, have been archived in the GEE platform. It is easy to composite pre-processed cloud-free satellite images in GEE. Using the Google Cloud Platform, we designed a framework to identify CPIS on a national scale using Pivot-Net.

Here, we used several steps to apply DL methods for identifying CPIS in satellite imagery. First, we designed a DL architecture that utilized shape information, named shape-attention Pivot-Net to identify CPIS. Second, we trained a more general CNN model, which could support time-series satellite images from Landsat-8 and Sentinel-2. Finally, we use a cloud computing platform to apply trained DL models to CPIS identification in the US, which is one of the widespread irrigation methods in the US. The application of the model throughout the US can effectively test the generalization capability of Pivot-Net due to the complex climate and rich landscape of the US. It can also provide critical support for agricultural water management in the US, where arid and semi-arid areas account for more than half of the country’s land area.

We aimed to answer the following questions: 1) can shape-attention Pivot-Net successfully identify CPIS in satellite images with high accuracy; 2) how does Pivot-Net perform when spatially transferring shape-attention Pivot-Net into regions without labels; and what is the efficiency of Google Cloud Platform when applying a well-trained DL model to satellite image segmentation at a large spatial scale, such as the contiguous US?

2 Materials and methods

2.1 Study area

The contiguous US was selected as the study area to identify CPIS. The contiguous United States covers an area of approximately 9.17 million square kilometers, which makes it the fourth largest country. The United States has a rich climate and complex landscapes such as Rocky Mountains, the Columbia Plateau, high plains deserts and the Great Plains, which provide an ideal area for Pivot-Net’s robustness and generalizability tests.

Global Irrigation Area Map produced by the International Water Management Institute (IWMI – GIAM) indicated that the total irrigated cropland area of the US is around 27.5 Mha, which accounts for ~18.5% of the cropland area in the US. The irrigated croplands in the US are mainly distributed along the Columbia River, Snake River, lower Mississippi River, and Central Valley in California. Additionally, there are some smaller areas in southern Nebraska, Kansas, and northern Texas that are irrigated (Ozdogan and Gutman Citation2008). The expansion of irrigation has triggered the decline of groundwater levels in High Plains and California Central Valley. According to previous research, irrigation accounted for 42% of total freshwater withdrawal in 2015 and consumed 80–90% of the total consumptive water in the US (Xie et al. Citation2019; Dieter et al. Citation2018). About 70% of the total irrigated cropland is in the western part of the US (Ozdogan and Gutman Citation2008), especially in the High Plains Aquifer (Deines et al. Citation2019). The proportion of pressurized irrigation systems has increased from 37% in 1984 to 72% in 2018 (USDA Citation2021). The center pivot system is an important irrigated manner, and it is widely used in the US (Zhang et al. Citation2018), especially in the central high plain area (Nebraska, Kansas, North of Texas) with serious groundwater depletion issues (Scanlon et al. Citation2012). CPIS identification can help to optimize groundwater management.

2.2 Dataset

2.2.1 Center pivot irrigation system dataset (CPISD)

The CPIS is the dominant form of irrigation in the High Plains area, so we selected Kansas and Nebraska as the pilot area to train and test our model. We composited the satellite data in the GEE Platform from March to August of 2018, which covers the start and peak of the growing season for major crops (Liu et al. Citation2020). The qualityMosaic() method in GEE is a maximum composite based on a specified metric (such as NDVI). If the quality index is NDVI, the composition is maximum composition of NDVI. To maintain the original signal of the satellite data and filter the cloud-free data, we used cloud-free cover (%) for each image as a quality band. Then, we used the Qualitymosaic() function to mosaic the cloud-free imagery. Both land surface reflectance from Landsat-8 data (USGS Landsat 8 Level 2 collection in GEE) and Sentinel-2 data (Sentinel-2 MSI: Level-2A collection in GEE) were used as a data source, which was archived in GEE and ranged 0 ~ 1 to ensure compariability of the numeric values. We selected reflectance bands of blue, green, red, and near infrared as input (B2, B3, B4, B8 for Sentinel-2; B2, B3, B4, B5 for Landsat8). In GEE, the combination of Sentinel-2 and Landsat 8 is performed by the merge() function in GEE in ascending order of satellite data acquisition time.

CPIS in Kansas and part of Nebraska were annotated manually. Firstly, the CPIS was interpreted visually in ArcGIS regardless of the condition of cultivation and recorded in a shapefile. Then, the shapefile was converted into raster format in TIFF with a spatial resolution of 30 m. All the images were split into 256×256 patches with an overlap of 128 pixels using the GDAL package (https://gdal.org/). More information about samples annotation can be found in Supplementary metarial 1.

Finally, we labeled 5559 images with 256×256 pixels in the whole of Kansas () and 639 images in the South Platte River (a tributary of the Missouri River) and part of the High Plains region (yellow rectangle in ) where CPIS was dense around Nebraska, Colorado, and Kansas borders. After time-series augmentation and multi-sensor augmentation, which will be introduced in section 2.2.2, we collected a total of 11424 images for Kansas and 1278 images in part of Nebraska. In Kansas, we used 10000 images as a training dataset and 1424 were used as the test dataset to determine the hyper-parameters in the DL models (~9:1). We used 1278 images of Nebraska as the first-stage validation dataset to evaluate all DL models.

Figure 1. Map of the study area (a) and composited images (RGB: near infrared, red and green) in the training & test area of Kansas (b). (c) are the images in the first-stage validation area at the borders of Colorado, Nebraska, and Kansas. The cropland mask is cultivated cropland from the cropland data layer (CDL) in 2018 (Boryan et al. Citation2011).

Figure 1. Map of the study area (a) and composited images (RGB: near infrared, red and green) in the training & test area of Kansas (b). (c) are the images in the first-stage validation area at the borders of Colorado, Nebraska, and Kansas. The cropland mask is cultivated cropland from the cropland data layer (CDL) in 2018 (Boryan et al. Citation2011).

To evaluate the performance of the well-trained state-of-the-art DL model during spatial transferring, three sites with a high density of CPIS and different landforms were selected. An additional 583, 434, and 583 images with 256 × 256 pixels were collected in Washington, Georgia, and Texas as second-stage validation areas. The specific location of the validation area is shown in the green rectangle in .

2.2.2 Data augmentation

Previous research showed that data augmentation could improve the generalization of the model and avoid overfitting with limited samples (Perez and Wang Citation2017; Pinto et al. Citation2020). It is a practical way to enlarge the dataset by applying these transformations and increasing the representativeness of the training dataset (Pinto et al. Citation2020). Usually, this trick includes zoom in/out, flip-up/down, rotation, adding Gaussian noise/random noise, and contrast transformation (Waldner and Diakogiannis Citation2020), as shown in . Specifically, flip and rotation transformations could simulate the different distribution of CPIS in the real world. The contrast transformation could capture differences in solar illumination. Zoom in/out slightly changed the size of CPIS.

Figure 2. Example of data augmentation of flip-up/down, rotation, random noisy, gaussian noisy, contrast transformation, and zoom in/out.

Figure 2. Example of data augmentation of flip-up/down, rotation, random noisy, gaussian noisy, contrast transformation, and zoom in/out.

In addition to these transformations, we performed two additional data augmentation operations on the satellite data. To ensure the model’s applicability for time-series data, we conducted time-series augmentation among the training dataset. We composited the background satellite data among several periods, including March – May, April – July, May – August in 2018, which represented the beginning, middle, and peak of the growing season. To facilitate the generalization between different satellite sensors, we used 30-m satellite images from Sentinel-2, Landsat-8, and a combination of the two.

) shows time series images from a combination of Sentinel-2 and Landsat-8 during March – May, April – July, May – August in 2018, and is the corresponding label. The CPIS had different characteristics during these three periods due to changes in vegetation. is an example of multi-sensor image augmentation. are images from the different satellites during May – August. But its spectral distribution () is varied. Besides conventional data augmentation transformation, the existing observation images from time-series and multiple sensors can also be used to enlarge the size of the samples.

Figure 3. Example of data augmentation of time-series and multiple sensors. b-c is for time-series data augmentation of a. d is label. e and g are multiple sensor data augmentation of a, which are derived from the Landsat-8 and Sentinel-2 satellites, respectively. f and h are the corresponding spectral distribution of e and g.

Figure 3. Example of data augmentation of time-series and multiple sensors. b-c is for time-series data augmentation of a. d is label. e and g are multiple sensor data augmentation of a, which are derived from the Landsat-8 and Sentinel-2 satellites, respectively. f and h are the corresponding spectral distribution of e and g.

3 Methods

To involve the round shape information into the CNNs, we design a Pivot-Net integrating multi-task learning and spatial attention mechenisim to induce the models to learn the shape information. The model was train and tested in Kansas. The Pivot-Net was validated in part of Nebraska and Colorado, which is near the traingin sites as the first-stage validation. To test the spatial transferability of proposed Pivot-Net, it was validated at three validation sites, which is far from the training sited as second-stage validation. After the Pivot-Net was well trained and validated, it was applied across the contiguous US. The framework of Pivot-Net training and application at the national scale is shown in .

Figure 4. Framework of Pivot-Net for training and application at the national scale.

Figure 4. Framework of Pivot-Net for training and application at the national scale.

3.1 Pivot-Net

3.1.1 Over architecture

The shape attention Pivot-Net was based on the classic encoder-decoder architecture of the U-Net backbone (Falk et al. Citation2019). U-Net is initially designed for biomedical image segmentation (Falk et al. Citation2019). U-Net (Flood, Watson, and Collett Citation2019) and U-Net-like architectures, including ResUNet (Zhang, Liu, and Wang Citation2018; Waldner and Diakogiannis Citation2020), have been shown to outperform other methods that segment satellite data in several remote sensing image segmentation contests (Yi et al. Citation2019; Saraiva et al. Citation2020). Typically, this architecture comprises two networks: encoder and decoder. The encoder part down-samples data into high-dimensional images, while the decoder part will restore or up-sampling compressed information into pixel-wise output with the same dimension as the input. Thus, it can concatenate high-level spatial information and low-level features (Jeppesen et al. Citation2019). The architecture of the shape-attention Pivot-Net is shown in .

Figure 5. Architecture of the shape-attention Pivot-Net. (a) is the main encoder-decoder architecture. (b) and (c) are the layers in the residual block and spatial-level attention gate. In the architecture, there is always one batch normalization layer after each convolution layer.

Figure 5. Architecture of the shape-attention Pivot-Net. (a) is the main encoder-decoder architecture. (b) and (c) are the layers in the residual block and spatial-level attention gate. In the architecture, there is always one batch normalization layer after each convolution layer.

Given an image set, XRw×h×c as input, the Pivot-Net sequentially infer a pivot mask map YmaskRw×h×1 and pivot boundary map YboundaryRw×h×1, where w, h, c is the width, height, and channel of images, respectively. The output is the probability of the Pivot mask and boundary and ranged from 0 to 1. The input image size was 256×256 with 4 channels (Red, Green, Blue and near-infrared). During encoding, it went through the five residual blocks and four max pooling layers, the output of the encoder part is a feature map with 16×16×512 dimensions. This feature map is the input of the following decoder part, which was comprised of four residual blocks and upsampling layers. The output image was restored to two feature maps with 256×256×1 dimensions for pivot mask and boundary, respectively.

3.1.2 Residual block

Several modifications were made to U-Net architecture to improve the performance in identifying CPIS. Inspired by Res-Net, we replaced a simple convolutional operation with a residual block in the Pivot-Net (). The residual block could solve the problem of gradient exploding/vanishing, which emerges as the depth of the neural network increases (Yi et al. Citation2019). The architecture of the residual block is shown in . Rather than parsing the target output directly, the residual block is trying to solve the difference between output and input as a shortcut, which is called “residual” (He et al. Citation2016), which can be represented by formula (1):

(1) yl=hxl+Fxl,θl(1)

where xlis the input of the residual block, yl is the target output and Fxl,θl denotes the residual part. In the formula, hxl=θ11x, θ11 means a 1×1 convolution kernel, which can increase or decrease the channel to match the feature map in Fxl,θl.Fxl,θl contains two convolutional layers followed by two batch normalization layers and one dropout layer. The filter size of each convolutional layer is 3 × 3 with the same padding. The number of filters equals the dimension of target output. The dropout layer will randomly eliminate the unit as 0 with a frequency, which could prevent the overfitting problem (Jeppesen et al. Citation2019). The dataset size usually has a large effect on the optimum dropout rate, with smaller datasets performing better with low dropout rates. A dropout rate of 0.5 is widely used but does not always optimize performance (Pauls and Yoder Citation2018). During our experiments, 0.25 outperformed the first-state validation dataset, so we used 0.25 as the optimal hyperparameter.

There were nine residual blocks in the Pivot-Net. Five of them were in the encoder part and four of them were in the decoder part. The input size and output size are shown in . The input of residuals in the encoder part is the original image or the output of the max pooling layer followed by the last residual block. The max pooling layer is to keep the maximum value in 2×2 regions, so the width and height will be cut in half, but the channel number will be kept unchanged. The input size of the residual block will be changed correspondingly.

Table 1. The input and output size of nine residual blocks in Pivot-Net.

The input of the residual block in the decoder part is the concatenating result between the upsampling of the previous residual block and the corresponding output of the one in the encoder part. Upsampling is the inverse operation of max pooling to restore one value to a 2×2 region and fill the null value with 0. The width and height will be twice compared to the original one. Take the 6th residual block as an example: the output size of the 5th residual block or the last on in encoder part is 16×16×512, which will be expanded to 32×32×512 after upsampling. Then this feature will be concatenated with the output of the 4th residual block (32×32×256). Hence, the input size of the 6th residual block is 32×32×(768 = 512 + 256).

In the architecture, each convolution kernel is followed by a batch normalization layer, which is to normalize the batch of input images and could accelerate the process of training, although it did not appear in .

3.1.3 Spatial-level attention gate

Inspired by how human beings understand the real world, the attention mechanism has been shown to be a useful “cardinality design” in the field of DL (Woo et al. Citation2018; Wang et al. Citation2017; Fu et al. Citation2019). Humans usually focus on “interesting points” after a glimpse of one image. Recently, scholars are trying to introduce this mechanism into CNN architectures to improve the performance in semantic segmentation tasks. Wang et al. (Citation2017) proposed a residual attention network, which achieved high object recognition performance and shows robustness against noisy labels. Oktay et al. (2018) designed an attention U-Net to segment the pancreas in medical images. Inspired by the spatial attention gate, we added four attention blocks in the connections between down-sampling and up-samplings in the U-Net architecture (). In the spatial-level attention gate, the low-level feature is updated with high-level features (left side in ), which can be represented by formula (2–3):

(2) yl=xlgxl,xh(2)
(3) gxl,xh=Fsigmoidθl1xh+Freluxl,θl2,θl1(3)

where xlRwl×hl×cl and xhRwh×hh×ch are the low-level and high-level input features. xl is the output of the residual block in the encoder part, while xh is from the residual block in the decoder part. The channel of wl,hl will be twice than wh,hh, respectively. θl1xh represents a 1×1 convolutional layer, which is mainly to adjust channels. Freluxl,θl2 is convolutional layer with a kernel of 2×2, a stride of 2×2 with same padding followed by a relu activation layer, so the width and height of xl will cut to be in half. This value will add to the adjusted high-level features (θl1xh). Fsigmoid,θl1 represents convolutional layer with a 1×1 filter followed by a sigmoid activation layer. The gxl,xh will be reduced to w×h×1 after the previous 1×1 convolutional layer. denotes the dot product between low-level and learned features. Finally, the original low-level feature will be updated by multiplying learned spatial features (gxl,xh).

There are four spatial attention gates in the Pivot-Net.Take the first one as an example: the xlR256×256×32 and xhR128×128×64; xl is the output of 1st residual block in encoder part, while xh is from the 8th residual block in the decoder part. Both size of Freluxl,θl2 and θl1xhwill be 128×128×32; then sum of these two values will be converted into size of 128×128×1 after one convolutional layer with filter of 1×1. Next, the learned feature will be upsampled to 256×256×1. Finally, this value will be used to update xl and the output size will be 256×256×32.

3.1.4 Multi-task training mechanism

In addition to the spatial attention gate, the inclusion of shape information is another key issue when identifying round CPIS. Sun et al. (Citation2020) proposed a shape-attentive U-Net for medical image segmentation and achieved highly accurate results on two public medical images segmentation datasets. Takikawa et al. (Citation2019) designed gated shape CNNs for segmentation tasks and also achieved high accuracy using Cityscapes datasets. In both types of research, boundary prediction is an additional task that is performed in addition to segmentation tasks. To capture information on the round shape of CPIS, we used multi-task learning to train our model, which included pixel-wise segmentation and boundary prediction. According to previous research, training networks that use related multi-task loss can improve the accuracy of initial tasks and the generalization of the model, because it is learned from the sharing representations (Takikawa et al. Citation2019; Waldner and Diakogiannis Citation2020). For Pivot-Net, the main goal is to enforce model learning shape-related intermedia representations during boundary detection and segmentation tasks.

As shown in , the output size of the last residual block is 256×256×32. The output layer includes two convolutional layers with a filter of 1×1. In each convolutional layer, the size will be reduced to 256×256×1 for pivot mask and pivot boundary.

3.1.5 Loss function

In both CPIS segmentation boundary detection tasks, Pivot-Net aims to identify two classes (i.e. CPIS and background). Dice coefficient, also known as F1-score, was the measurement of overlap and similarity between two sets, which is the adjusted average of precision and recall. Corresponding loss function, Dice loss, is a commonly used loss function, which could make the model have better performance in both precision and recall. So, Dice loss was used as loss based on CPIS label and predicted extent. The Dice loss can be defined as for formula (4–5):

(4) Dice=2ABA+B(4)

where A and B are countable sets. Then the Dice loss (LDice) with smoothness can be represented by formula (5):

(5) LDice=12AB+smoothA+B+smooth(5)

where smooth is a smooth factor that is usually set to 1. As mentioned above, Pivot-Net is used to predict dual-task. So, the final loss function was defined as the sum of the weighted loss of segmentation and boundary detection, which can be defined as:

(6) Ltotal=λ1LDiceseg+λ2LDiceboundary(6)

where Ltotal means the total loss. λ1 and λ2 is the weight of dual-task loss. Empirically, we set λ1=0.5 and λ2=0.5, which worked well in our experiments. LDiceseg is the Dice loss of the segmentation part, while LDiceboundary is that of the boundary detection task.

3.2 Training and application details

Our model was implemented in Keras 2.3 (https://keras.io/) with the TensorFlow 2.1 backend (https://www.tensorflow.org/) and trained on four NVIDIA GeForce GTX TITAN X GPUs (12GB RAM). Data augmentation, including zoom in/out, flip-up/down, rotation, adding Gaussian noise/random noise, and contrast transformation, was randomly performed for each patch at a chance of 25%, and one patch could be transformed once or several times according to certain opportunities. Each patch was minimum-maximum normalized before being used as input.

We used the Adam optimizer, the most used optimizer, to train Pivot-Net, which could adjust the learning rate dynamically in every iteration. The initial learning rate was set at 1×104. Due to the limitation of the GPU’s memory, the batch size was set as 8 in our experiments. The maximum iteration was configured as 142,800 times (steps per epoch = 1428 and epoch = 100). To avoid overfitting, we employed early stop trick with patience = 4, indicating that training will stop if the accuracy did not improve after four epochs.

We composited cloud-free satellite data using available optical images from Sentinel-2 and Landsat-8 for each state in GEE from March to August of 2018, which included the start and peak of the growing season of major crops. The percentage of the non-cloud cover was used as quality bands to generate the clearest image during this period. All the exported data from GEE was stored in Google Drive. Then we uploaded the trained model stored in a local high-performance computer to Google drive. Using powerful computing from Google Colab pro+ (https://colab.research.google.com/), which could use the satellite data in Google Drive, the well-trained Pivot-Net was applied across the entire contiguous US. All satellite data was split into 256×256 patches with an overlap of 128 pixels (Stride = 128 pixels). The maximum prediction probability in the overlap region was used as the final prediction.

3.3 Model comparison

We compared Pivot-Net with seven state-of-the-art segmentation models, including UNet, HRNet, Deeplab v3+, LinkNet with different backbones, which is implemented with the GitHub repository https://github.com/qubvel/segmentation_models.pytorch.

UNet is a classic architecture for satellite segmentation tasks, which has been widely involved in land-cover segmetation (Flood, Watson, and Collett Citation2019). Due the basic architecture of Pivot-Net, we combined it with two classic backbones (ResNext50 and EfficientNetB7) to compare it with other architecture.

DeepLabv3+ is an excellent architecture for segmentation, which introduces an atrous spatial pyramid pooling (ASPP) module in encoder – decoder network, so the model can obtain a larger receptive field without losing the image resolution. It achieved state-of-the-art performance in the PASCAL VOC challenge, and is actively used in building footprint detection (Li and Dong Citation2022) and landcover mapping (Liu et al. Citation2021).

HRNet (High-resolution network) is a parallel convolution architecture maintaining high-resolution representations throughout the network (Wang et al. Citation2020). It reported a state-of-the-art performance in the semantic segmentation task in several open-access datasets (i.e. scapes, and COCO) (Liu et al. Citation2021).

The LinkNet achieves efficient and accurate semantic segmentation with limited parameters. It achieved state-of-the-art performance in CVPR DeepGlobe 2018 road extraction challenge (Ragettli, Herberz, and Siegfried Citation2018). The specific backbones and parameters were shown in .

Table 2. Confusion matrix for Pivot-Net in the first-stage validation dataset.

Table 3. The performance comparison between shape-attention Pivot-Net and other state-of-art models (%).

We used three metrics to evaluate the performance of the segmentation result, including F1-score, Intersection over Union (IOU), and Mean IOU and overall accuracy (OA), which can be defined using the formulas (7–10):

(7) IOU=TPTP+FP+FN(7)

(8) MeanIOU=1nk=1nIOUk(8)
(9) OA=TP+TNTP+TN+FP+FN(9)
(10)

where TP, TN, FP, and FN are the number of true positive, true negative, false positive, and false negative, respectively. For Mean IOU, there were two classes, CPIS and others, so n = 2.

4 Results

4.1 Results on first stage validation

Using a local server, the shape-attention Pivot-Net was trained with the maximum epoch of 100, which took ~8.5 h. During the model training process, the dice coefficient in the testing dataset did not improve after 40 epochs and converged after 60 epochs, as shown in . The minimum loss on the test dataset was 0.26 at 59 epochs, and weight was saved as the best model. In the test dataset of 1278 images, the overall accuracy was 96.63%, with a recall of 89.68% and a precision of 86.04%. The confusion matrix of first-stage validation is shown in .

Figure 6. Process of training shape attention Pivot-Net.

Figure 6. Process of training shape attention Pivot-Net.

To evaluate the performance of Pivot-Net, we compared its accuracy metrics with 6 other segmentation models. The result is shown in . The overall accuracy of Pivot-Net was the highest followed by UNet with the EfficienctNetB7 backbone, UNet with the ResNext50 backbone, and LinkNet with the ResNext50 backbone. The mean IOU of Pivot-Net was 87.22%, with 0.25% enhancement compared to the U-Net with the EfficienctNetB7 backbone and a 0.48% increase compared with LinkNet+ResNext50. For the single IOU of CPIS, Pivot-Net also achieved a better score than other state-of-art architectures. The IOU of CPIS for Unet+EfficentNetB7 was 77.93%, while Pivot-Net was 0.36% higher at 78.29%. Besides the accuracy metrics, we also compared the parameters for each model. In these models, HR-Net was the lightest model with 9.52 million parameters, followed by Pivot-Net with 9.79 million parameters. However, the accuracy of Pivot-Net was outperformed by HR-Net with an enhancement of 6.81% for CPIS IOU, and 4.45% for the F1-Score.

4.2 Results on spatial transferring

We checked the accuracy of Pivot-Net in predicting the CPIS at the national scale as the second-stage validation. After the model was well trained in the local server, we uploaded the architecture and weights into Google Drive and applied our model to predict CPIS in the three validation regions (Washington, Texas, and Georgia). The accuracy metric is in . The overall accuracy, with an average value of 98.13%, varied from 97.70% in Texas and 98.41% in Georgia. The IOU of CPIS was highest in Washington with a value of 87.02% and lowest in Georgia at 65.78%. The average precision was 95.83% and recall was 86.05%. The comparison between predicted results and truth labels is shown in

Table 4. Metrics for model comparison in three second-stage validation areas (%).

We also applied other state-of-the-art models in the second-validation sites, the result was listed in and . Compared with the first-stage result, Pivot-Net achieved higher performance in the second-stage validation site, where is far from the training testing area. According to the F1-Score, Pivot-Net increased 4.94%, 4.57%, and 3.74% in Washington, Georgia, and Texas, respectively, compared to the the second most accurate model. The Pivot-Net had the highest IOU, which was 7.78%, 5.15%, and 6.31% higher than the other models, respectively. As for other models, the Unet+EfficentNetB7 performed well in Washington and Texas, while HR-Net worked well in Georgia.

Figure 7. Average metric for model comparison in three second-stage validation areas.

Figure 7. Average metric for model comparison in three second-stage validation areas.

The Pivot-Net results for Washington are in . Overall, the result was similar to the truth in the north and south of the validation region as shown in . However, there are many false-negatives in between these regions as depicted in due to heterogeneous tones and inconspicuous shapes of CPIS. In this validation region, the precision was 98.04%, the recall was 82.82%, the F1-Score was 89.79%, and the IOU of CPIS was 81.47%.

Figure 8. Comparison between predicted results and truth in the Washington validation site. (a) is the overall figure. (b,d,f) is the detailed map and its corresponding false composited satellite images (c,e,g). TP, FN, and FP represent truth positive, false negative, and false-positive samples, respectively.

Figure 8. Comparison between predicted results and truth in the Washington validation site. (a) is the overall figure. (b,d,f) is the detailed map and its corresponding false composited satellite images (c,e,g). TP, FN, and FP represent truth positive, false negative, and false-positive samples, respectively.

In Texas, the comparison between predicted results and truth is in . Overall, most of CPIS was correctly identified with an overall accuracy of 97.70% and a mean IOU of 92.15%. shows the detailed map of the northern, middle, and southern portions of the validation site, respectively. In this validation region, the precision of the result was 95.30% and the recall was 90.93%, the highest among the three validation sites. The F1-Score was 93.06% and the IOU of CPIS was 87.02%.

Figure 9. Comparison between predicted results and truth labels in the texas validation site. (a) is the overall figure. (b,d,f) is the detailed map and its corresponding false composited satellite images (c,e,g). TP, FN, and FP represent truth positive, false negative, and false-positive samples, respectively.

Figure 9. Comparison between predicted results and truth labels in the texas validation site. (a) is the overall figure. (b,d,f) is the detailed map and its corresponding false composited satellite images (c,e,g). TP, FN, and FP represent truth positive, false negative, and false-positive samples, respectively.

is the predicted result in Georgia. Overall, some CPIS was correctly identified with an overall accuracy of 98.41% and a mean IOU of 82.07%. shows maps in the northern, middle, and southern portions of the validation site, respectively. The CPIS was not as easily identified in Georgia as in Washington and Texas, even for manual visual interpretation, because CPIS was inhomogeneous color, tone, and shape. In this validation region, the precision of the result was 92.73% and the recall was 70.44%, the lowest of the three validation sites. The F1-Score was 80.06% and the IOU of CPIS was 65.78%. Because HR-Net tries to maintain high-resolution representations throughout the network, it achieves better metrics except for Pivot-Net.

Figure 10. Comparison between predicted results and truth labels in the validation site of Georgia. (a) is the overall figure. (b,d,f) is the detailed map and its corresponding false composited satellite images (c,e,g). TP, FN, and FP represent truth positive, false negative, and false-positive samples, respectively.

Figure 10. Comparison between predicted results and truth labels in the validation site of Georgia. (a) is the overall figure. (b,d,f) is the detailed map and its corresponding false composited satellite images (c,e,g). TP, FN, and FP represent truth positive, false negative, and false-positive samples, respectively.

4.3 CPIS distribution in the contiguous US

After validating the accuracy in Washington, Texas, and Georgia, the model was applied to 48 states in the contiguous US. Cloud-free images were composited in GEE and stored in Google Drive, which had a total size of 115 GB. We applied the well-trained Pivot-Net using Google Colab Pro+. The distribution of CPIS in the contiguous US is shown in . The CPIS were densely distributed in the High Plains of Nebraska, Kansas, Texas, and Colorado, and southern Washington State, Idaho, and Georgia. In California, which has one of the largest irrigation districts, the CPIS area was not so high, which indicated that Center-pivot sprinkler irrigation was not the dominant irrigation method in this region. In the central High Plains, CPIS was the prevailing irrigation method.

Figure 11. The distribution of CPIS in the contiguous US (b). b-j is the detailed map in Washington, Idaho, Nebraska, Wisconsin, Kansas, Georgia, Lower of Texas, upper of Texas, Arizona, and Colorado.

Figure 11. The distribution of CPIS in the contiguous US (b). b-j is the detailed map in Washington, Idaho, Nebraska, Wisconsin, Kansas, Georgia, Lower of Texas, upper of Texas, Arizona, and Colorado.

We estimated the total area of CPIS for the entire contiguous US to be 61,094 km2. The area of CPIS is shown in for each state. Nebraska had the most CPIS area (13,148 km2) among all the states in the contiguous US, followed by Texas (13,102 km2), Kansas (9332 km2), and Colorado (4911 km2). Other than the High Plains, CPIS was mostly distributed in Idaho (4329 km2) and Washington (2731 km2). The area of CPIS in Georgia (1155 km2) is also very high. According to Xie, there were 23.3 million ha of irrigated cropland in the contiguous US (Xie et al. Citation2019). Using this figure, we estimated that CPIS comprised 26.22% of all irrigated lands. Our results could be accessed at https://tianfyou.users.earthengine.app/view/cpisus and http://cropwatch.com.cn/newcropwatch-test/main.htm?language=ch&token=crop-627c67d2a44f50.70746342#.

Figure 12. Area of CPIS in each state in the contiguous US.

Figure 12. Area of CPIS in each state in the contiguous US.

5 Discussion

Here we demonstrated that the inclusion of shape information in Pivot-Net helped to better identify CPIS in satellite images, a model based on U-Net and comprising a spatial level attention gate and a residual block. Our model was trained using a dual-task that included the extent and boundary of CPIS to capture information about the round shape of CPIS. We trained our model on 11,424 images of Kansas. Overall, the shape-aware Pivot-Net we developed outperformed the U-Net model in both precision and IOU in the test dataset. We conducted an ablation experiment to verify the performance of the spatial-level attention gate and the multi-task training method, which demonstrated that both the multi-task and the spatial level attention gate improved classification accuracy. To evaluate the performance of spatial transferring of Pivot-Net in other states, three validation areas in Washington, Texas, and Georgia were used to validate the well-trained model. The average F1-score in the validation region was 90.68% and the IOU of CPIS was 82.94%. Finally, we applied the model to other states in the contiguous US with the help of Google Colab and the GEE platform.

5.1 Ablation study of the proposed shape-attention gate

When identifying CPIS in Brazil, de Albuquerque et al. (Citation2020) compared U-Net, Deep ResUnet, and SharpMask and found that U-Net and Deep ResUnet outperformed other architecture according to F1-score and recall. Likewise, we compared the proposed shape attention Pivot-Net with the U-Net and the Deep ResUnet model to estimate the performance of spatial level attentional gate and residual block. The deep residual U-Net (Deep ResUnet) combines the strengths of deep residual learning and the U-Net architecture (Zhang, Liu, and Wang Citation2018), which can be implemented via https://github.com/nikhilroxtomar/Deep-Residual-Unet.

To evaluate the performance of Pivot-Net, we compared its accuracy metrics with U-Net and Deep ResUnet. Four experiments were set to test the performance of Pivot-Net, U-Net, and Deep ResUnet. First, Pivot-Net was trained with single task and multi-task to estimate the performance of multi-task learning. Second, to confirm the performance residual block, the U-Net and Deep ResUnet were also trained with the multi-task. Third, to estimate spatial-level attention gate. Deep ResUnet trained with multi-task was compared with the Pivot-Net. Four, the difference between shape-attention Pivot-Net and vanilla U-Net is the performance of the combination of spatial-level attention gate, residual block, and multi-task learning.

To confirm the performance of multi-task training, we trained U-Net and Deep ResUnet with multi-task (). We found that the IOU of CPIS for U-Net and Deep ResUnet trained with multi-task was 1.16% and 0.48% higher than single task, respectively. Also, there was a 1.04% increase in the F1-Score when the U-Net model was trained with multi-task. For Pivot-Net, if it was trained with only the segmentation task, the IOU of CPIS was 77.92%, which was 0.37% lower than when trained with multi-tasking.

Table 5. The performance comparison between shape-attention Pivot-Net and U-Net (%).

By comparing the metric for U-Net and Pivot-Net, the performance of spatial-level attention gate and residual block was estimated. From , the conclusion can be drawn that there was a 1.75% increase in the F1-Score when comparing the value from U-Net and Pivot-Net trained with a single task. The residual block increased accuracy by 1.27% via comparing the F1-Score between U-Net and Deep ResUnet. The IOU of CPIS for U-Net trained with a single task was 76.82% and for Pivot-Net (single task), it was 77.92%, which indicated that the spatial-level attention gate increased the IOU of CPIS by 1.10%.

In the ablation experiment, the F1-Score was 1.75% higher for Pivot-Net than U-Net when they were trained with a single task, which confirmed that the inclusion of the spatial level attention gate and the residual block was beneficial.

is a detailed map in the test dataset, which also depicts the difference between Pivot-Net and U-Net spatially. As seen in , there is a slight difference between U-Net and Pivot-Net, both of which could identify the CPIS in the clear scenes. But for a slightly fuzzy scenario, like in , the prediction of Pivot-Net was better able to identify the round shape and the IOU was higher than U-Net. In the upper green rectangle of , the U-Net misclassified the CPIS. In the lower green rectangle of , the result from U-Net was segmented with many small holes rather than a filled round shape, but Pivot-Net was able to correctly identify the round shape as we expected. Overall, the Pivot-Net was better able to identify the round shape than U-Net.

Figure 13. Comparison between the result of Pivot-Net and U-Net. The green rectangles are the misclassified region in U-Net.

Figure 13. Comparison between the result of Pivot-Net and U-Net. The green rectangles are the misclassified region in U-Net.

In addition to some representative features inside the object, like color, texture, and tones, the shape is also a distinctive feature for semantic segmentation (Waldner and Diakogiannis Citation2020; Takikawa et al. Citation2019; Zhao et al. Citation2018). The round shape is a distinctive feature of CPIS. To introduce shape information, we integrated spatial-level attention gate and multi-task learning into U-Net architecture. We found that this combination could accurately identify the round shape in CPIS. To further investigate the representing feature in the models, we visualized the feature map in the last layer in Pivot-Net, U-Net, and Deep ResUnet (). The U-Net model learned many features in the extent and background, which may contain color, textures, and tones, while the Deep Res Unet model mainly focuses on the mask of the CPIS. Nevertheless, the feature map in Pivot-Net usually focused on the shape information, which is the primary information needed to identify CPIS.

Figure 14. Visualization of feature map in three DL architectures.

Figure 14. Visualization of feature map in three DL architectures.

Compared with the shape-attentive U-Net proposed by Sun et al. (Citation2020), there are three major differences. Firstly, we added the Residual block () as the backbone in the Pivot-Net, while the backbone of the shape-attention U-Net is VGG-16 Network. The Residual block could solve the problem of gradient vanishing during model training, which has been widely testified. Secondly, there is some difference in attention mechanism. Shape-attention U-Net used a Dual (spatial & channel) attention block. But for CPIS identification, the spatial attention mechanism is more important than the channel, because four input bands could provide useful information. So, the Pivot-Net only employed spatial-level attention, which could compress the parameters thus reducing computational complexity and the possibility of overfitting. Thirdly, the shape information extraction method in Pivot-Net is different from shape-attention U-Net. The former Network designed two-stream architecture, respectively, to extract texture and shape information, while Pivot-Net is trying to “teach” the model directly to extract shape information via one stream regardless of the texture information. Because the shape information is a distinguishing feature in CPIS identification.

5.2 Generalizability of multi-sensor and time-series images

Besides conventional data augmentation transformations, we also used images from time-series and multiple sensors to augment the dataset. Compared to RGB image processing, the difference between satellite images and ordinary RGB images is that satellite images are multi-spectral or multi-band (Li et al. Citation2019; Pinto et al. Citation2020). Meanwhile, time-series and satellite images from multiple sensors have improved the classification of land surface types. The segmentation task of satellite images is far from simple computer version tasks, which are usually based on single-date images, but time-series and multi-sensor images based on the same location. Time-series images are often used to monitor crop yields (You et al. Citation2017), map crops (Tian et al. Citation2019; Wang et al. Citation2021), and the land cover mapping (Ienco et al. Citation2019; Pinto et al. Citation2020). Classification schemes that use time-series images often outperform those that use images from a single date (de Albuquerque et al. Citation2021).

We used time-series and multi-sensor image augmentation and image manipulation, such as flipping, adding noise, and cropping, to diversify the training samples and improve the classification. The data generated by each observation from a satellite for the same location varied with time of data aquisition and the sensor, which is a “data augmentation” way for enhancing the size of training data. By employing time-series and multi-sensor image augmentation, we not only expanded the volume of the training dataset but also made the model more general. When applying Pivot-Net to the contiguous US, we found the model to be robust for classifying CPIS across time-series images.

There is an obvious spectral difference between Sentinel-2 and Landsat-8 data as shown in of the bellowing figures, although the surface reflectance was in same range (0–1) and the numeric value is comparable. Usually, we need to “align” or “normalize” there two spectral curves. But for human, we could easily understand both of these two types of images even could not distinguish them from each other. So, the general DL model also could recognize multi-sensor images with specific band combination and regardless of sensor type. To do this task, we employed the multi-sensor data augmentation in this MS, which is a special operation in this MS to make the DL model could “understand” multi sensor satellite images. Specifically, we use same label file with multi-sensor images to enlarge the number of samples from 5559 to 11424. Finally, this model could identify CPIS from both of Sentinel-2 and Landsat-8 data using selected four bands.

Some researchers employed the objected-based method to detect the CPIS in arid regions, where the CPIS is visibly obvious (Johansen et al. Citation2021; Li, Johansen, and McCabe Citation2022). Nevertheless, the method is hard to apply in sub-humid or humid regions where the CPIS is homogeneous with neighboring agricultural fields. The territory of the United States covers an area of approximately 9.17 × 106 km2, which covers very complex landscapes and provides an ideal study area to test the robustness and generalizability of Pivot-Net. Actually, the results indicated that our model showed a good recognition ability in the CPIS across the whole US, even in some states that have scattered CPIS distribution, such as Iowa and Tennessee. Therefore, our model can be used in the identification of CPIS in Central Asia and Northern China. In the future, we will further assess the transferability of the CPIS model in other regions of the world.

5.3 Potential usage of CPIS identification at large spatial scales

CPIS is characterized by high irrigation efficiency, which is defined as the ratio of water used for crops and total water removed from a water source and usually depends on irrigation methods (Ma et al. Citation2018). It has been estimated that the average efficiency for surface irrigation is 65%, sprinkler irrigation is 80%, and drip irrigation is 90% (Howell Citation2003). So, the proportion of irrigation methods will reflect the average irrigation efficiency at the basin scale. With the help of GEE and Google Colab, we mapped CPIS in the contiguous US and found the CPIS area to be 61,094 km2. Compared to the annual irrigation maps across the entire High Plains Aquifer (AIM-HPA_rse) produced by Deines et al. (Citation2019), the spatial distribution is mostly consistent (). Four detailed maps in Nebraska, Kansas, and Texas were also illustrated (b-g, c-h, d-I, e-j). The CPIS identified by Pivot-Net filled the gap and holes in AIM-HPA_rse. According to Xie et al. (Citation2019), there was 23.3 million ha of irrigated cropland in the contiguous US. Thus, we estimated that 26.22% of irrigated lands are CPIS. Also, the prevailing irrigation method for each irrigation district can be estimated to help solve the irrigation efficiency paradox (Grafton et al. Citation2018), in that much effort and technological advancements have been made to improve irrigation efficiency, but it seems that available crop water has not increased.

Figure 15. Comparison between proposed CPIS with annual irrigation maps across the entire high plains aquifer (AIM-HPA_rse) (Deines et al. Citation2019). a-e and f-j are Pivot-Net CPIS maps and AIM-HPA_rse, respectively.

Figure 15. Comparison between proposed CPIS with annual irrigation maps across the entire high plains aquifer (AIM-HPA_rse) (Deines et al. Citation2019). a-e and f-j are Pivot-Net CPIS maps and AIM-HPA_rse, respectively.

Water accounting and measurements should be conducted at the basin scale, which can determine the largest water users and the most common methods of irrigation (Knox, Kay, and Weatherhead Citation2012). With the combination of Google Colab pro+ and GEE, we applied the well-trained Pivot-Net model to the entire contiguous US. Also, it is possible to apply the Pivot-Net model to map CPIS in other regions of the world. There are several advantages to our Pivot-Net framework.

  1. Our Pivot-Net framework significantly improved the accuracy of mapping CPIS.

  2. With the help of GEE, cloud-free satellite images can be composited simply and the downloading of the raw satellite data is unnecessary when using GEE, which is time-consuming.

  3. Our model can be applied to all data via the Colab platform and CPIS can be extracted at a very low cost.

  4. It took only ~8.5 h to finish predicting the CPIS for the contiguous US. Our framework can not only accurately map CPIS to support water resource accounting and management but it could be modified to identify other land cover types with special shapes at large scales.

5.4 Limitation and outlook

There were some issues with our results and some improvements could be made. First, although Pivot-Net had high accuracy in classifying CPIS, a well-trained model may lose some accuracy during the process of spatial transferring (Zheng et al. Citation2020). There were some misclassifications for the CPIS in some regions due to confusing satellite images and the algorithm. The misclassification in Georgia was mainly due to the mixing of the vegetation signal in one CPIS (), while the omitted CPIS in Washington was due to the ambiguous shape feature and regional adaptability of the model ().

Meanwhile, the most recent studies used instance segmentation (de Albuquerque et al. Citation2021) and object detection methods (Carvalho Osmar et al. Citation2020) to identify CPIS and related ladcovers, which could also get the number, extent, and mask of CPIS. Alternatively, we could add one task of watershed distance to separate the continuous CPIS, like what Waldner and Diakogiannis (Citation2020) did for field segmentation, which also achieved instance segmentation. Overall, the community should pay more attention to enhance generalizability and spatial transferability of deep learning models to promote the application of DL in satellite image land cover classification, especially at large spatial scales. In the future, the model transferability should be assessed at the global scale because the distribution of CPIS in other places is relatively scattered and random compared to the US, such as central Asia and China. More labels should be annotated for each region to enhance the model’s applicability for specific areas.

Second, images from more than one sensor were used to train the DL model, but we did not choose to use synthetic aperture radar (SAR) data, which can penetrate cloud cover and thus observe the same location regardless of the weather conditions (Bofana et al. Citation2022; de Albuquerque et al. Citation2021). Thus, the incorporation of SAR data into our model may improve the accuracy of the CPIS classification and other land cover mapping efforts (Ienco et al. Citation2019). Likewise, data from other satellite platforms could be used in the Pivot-Net model in the future to improve the land cover classification.

Third, some of the CPIS croplands were abandoned and were not productive in the year of our classification. After identifying the present and the past CPIS using the round shape information, vegetation indices could be used to distinguish whether there was vegetation in the CPIS or not.

6 Conclusion

Irrigation maps by irrigation type are important for agricultural water management. Here, we developed a shape-attention Pivot-Net to identify the CPIS in satellite images. By integrating spatial-attention gate, residual block, and multi-task learning, Pivot-Net successfully extracted the round shape in CPIS. The spatial level attention gate could make the DL model focus on the spatial information, while the boundary detection in multi-task learning introduces more representative round shape features. The Mean IOU of Pivot-Net was 87.22% in the test dataset, which outperformed some current state-of-the-art models. Due to the shape information being well learned, Pivot-Net achieved promising results during spatial transferring at three validation sites, with an overall accuracy of 98.13% and mean IOU of 90.45%.

With the support of a cloud computing platform, the first CPIS map at 30 m was generated for the contiguous US efficiently. Our results can be accessed at https://tianfyou.users.earthengine.app/view/cpisus. On one hand, our approach can be used as a general framework for applying well-trained DL models on a large scale then promoting the well-designed DL algorithm to DL-based earth observation products at large spatial scales. On the other hand, our model can be used to determine where center pivot irrigation is being employed, and this information can be used to determine the irrigation efficiency at the basin scale to support water resource management, especially for the largest water use sector – agricultural water use.

Data availability

The result could be accessed at https://tianfyou.users.earthengine.app/view/cpisus. The data that support the findings of this study are available from the corresponding author, Bingfang Wu, upon reasonable request.

Supplemental material

Supplemental Material

Download MS Word (1 MB)

Acknowledgment

This research was financially supported by the National Key Research and Development Project of China (No.2019YFE0126900) and the National Natural Science Foundation of China (41861144019, 42071271), the Strategic Priority Research Program of Chinese Academy of Sciences (XDA19030201).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/15481603.2023.2165256

References

  • Bofana, J., M. Zhang, B. Wu, H. Zeng, M. Nabil, N. Zhang, A. Elnashar, F. Tian, J. M. da Silva, A. Botão, A. Atumane. 2022. “How Long Did Crops Survive from Floods Caused by Cyclone Idai in Mozambique Detected with Multi-Satellite Data.” Remote Sensing of Environment 269: 112808.
  • Boryan, C., Z. Yang, R. Mueller, and M. Craig. 2011. “Monitoring US Agriculture: The US Department of Agriculture, National Agricultural Statistics Service, Cropland Data Layer Program.” Geocarto international 26 (5): 341–25. doi:10.1080/10106049.2011.562309.
  • Brocca, L., A. Tarpanelli, P. Filippucci, W. Dorigo, F. Zaussinger, A. Gruber, and D. Fernández-Prieto. 2018. “How Much Water is Used for Irrigation? A New Approach Exploiting Coarse Resolution Satellite Soil Moisture Products.” International Journal of Applied Earth Observation and Geoinformation 73: 752–766. doi:10.1016/j.jag.2018.08.023.
  • Carvalho Osmar, L. F., O. A. de Carvalho Júnior, A. O. de Albuquerque, P. P. de Bem, C. Rosa Silva, P. Henrique Guimarães Ferreira, R. dos Santos de Moura, R. Arnaldo Trancoso Gomes, R. Fontes Guimarães, and D. Leandro Borges. 2020. “Instance Segmentation for Large, Multi-Channel Remote Sensing Imagery Using Mask-RCNN and a Mosaicking Approach.” Remote Sensing 13 (1): 39. doi:10.3390/rs13010039.
  • Chen, Y., D. Lu, L. Luo, Y. Pokhrel, K. Deb, J. Huang, and Y. Ran. 2018. “Detecting Irrigation Extent, Frequency, and Timing in a Heterogeneous Arid Agricultural Region Using MODIS Time Series, Landsat Imagery, and Ancillary Data.” Remote Sensing of Environment 204: 197–211. doi:10.1016/j.rse.2017.10.030.
  • Chen, X., F. Wang, L. Jiang, C. Huang, P. An, and Z. Pan. 2019. “Impact of Center Pivot Irrigation on Vegetation Dynamics in a Farming-Pastoral Ecotone of Northern China: A Case Study in Ulanqab, Inner Mongolia.” Ecological indicators 101: 274–284. doi:10.1016/j.ecolind.2019.01.027.
  • de Albuquerque, A. O., O. L. F. de Carvalho, C. R. eSilva, P. P. de Bem, R. A. Trancoso Gomes, D. Leandro Borges, R. Fontes Guimarães, C. M. McManus Pimentel, and O. A. de Carvalho Júnior. 2021. “Instance Segmentation of Center Pivot Irrigation Systems Using Multi-Temporal SENTINEL-1 SAR Images.” Remote Sensing Applications: Society and Environment 23: 100537.
  • de Albuquerque, A. O., O. L. F. de Carvalho Júnior, A. S. Cristiano Rosa eSilva, P. Luiz, P. de Bem, R. A. Trancoso Gomes, R. F. Fontes Guimarães, and O. A. de Carvalho Júnior. 2021. “Dealing with Clouds and Seasonal Changes for Center Pivot Irrigation Systems Detection Using Instance Segmentation in Sentinel-2 Time Series.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14: 8447–8457.
  • de Albuquerque, A. O., O. A. de Carvalho Júnior, O. L. F. de Carvalho, P. P. de Bem, P. H. Guimarães Ferreira, R. dos Santos de Moura, C. Rosa Silva, R. A. Trancoso Gomes, and R. A. Fontes Guimarães. 2020. “Deep Semantic Segmentation of Center Pivot Irrigation Systems from Remotely Sensed Data.” Remote Sensing 12 (13): 2159. doi:10.3390/rs12132159.
  • Deines, J. M., A. D. Kendall, M. A. Crowley, J. Rapp, J. A. Cardille, and D. W. Hyndman. 2019. “Mapping Three Decades of Annual Irrigation Across the US High Plains Aquifer Using Landsat and Google Earth Engine.” Remote Sensing of Environment 233: 111400. doi:10.1016/j.rse.2019.111400.
  • Dheeravath, V., P. S. Thenkabail, G. Chandrakantha, P. Noojipady, G. P. O. Reddy, C. M. Biradar, M. K. Gumma, and M. Velpuri. 2010. “Irrigated Areas of India Derived Using MODIS 500 M Time Series for the Years 2001–2003.” Isprs Journal of Photogrammetry and Remote Sensing 65 (1): 42–59. doi:10.1016/j.isprsjprs.2009.08.004.
  • Dieter, C. A., M. A. Maupin, R. R. Caldwell, M. A. Harris, T. I. Ivahnenko, J. K. Lovelace, N. L. Barber, and K. S. Linsey. 2018. Estimated use of water in the United States in 2015. U.S. Geological Survey Circular 1441, p. 65. doi: 10.3133/cir1441.
  • Droogers, P., W. W. Immerzeel, and I. J. Lorite. 2010. “Estimating Actual Irrigation Application by Remotely Sensed Evapotranspiration Observations.” Agricultural Water Management 97 (9): 1351–1359. doi:10.1016/j.agwat.2010.03.017.
  • Falk, T., D. Mai, R. Bensch, Ö. Çiçek, A. Abdulkadir, Y. Marrakchi, A. Böhm, et al. 2019. “U-Net: Deep Learning for Cell Counting, Detection, and Morphometry.” Nature Methods 16 (1): 67–70.
  • Flood, N., F. Watson, and L. Collett. 2019. “Using a U-Net Convolutional Neural Network to Map Woody Vegetation Extent from High Resolution Satellite Imagery Across Queensland, Australia.” International Journal of Applied Earth Observation and Geoinformation 82: 101897. doi:10.1016/j.jag.2019.101897.
  • Fu, J., J. Liu, H. Tian, L. Yong, Y. Bao, Z. Fang, and L. Hanqing. 2019. “Dual Attention Network for Scene Segmentation.” Paper presented at the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA.
  • Geirhos, R., P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel. 2018. “ImageNet-Trained CNNs are Biased Towards Texture; Increasing Shape Bias Improves Accuracy and Robustness.“ International Conference on Learning Representations (ICLR), New Orleans, Louisiana.
  • Grafton, R. Q., J. Williams, C. J. Perry, F. Molle, C. Ringler, P. Steduto, B. Udall, et al. 2018. “The Paradox of Irrigation Efficiency.” Science 361 (6404): 748–750.
  • He, K., X. Zhang, S. Ren, and J. Sun. 2016. “Deep Residual Learning for Image Recognition.” Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, Nevada.
  • Hinton, G. E., and R. R. Salakhutdinov. 2006. “Reducing the Dimensionality of Data with Neural Networks.” Science 313 (5786): 504–507.
  • Howell, T. A. 2003. “Irrigation Efficiency.” Encyclopedia of Water Science 467: 500.
  • Ienco, D., R. Interdonato, R. Gaetano, and D. Ho Tong Minh. 2019. “Combining Sentinel-1 and Sentinel-2 Satellite Image Time Series for Land Cover Mapping via a Multi-Source Deep Learning Architecture.” Isprs Journal of Photogrammetry and Remote Sensing 158: 11–22. doi:10.1016/j.isprsjprs.2019.09.016.
  • Jeppesen, J. H., R. H. Jacobsen, F. Inceoglu, and T. S. Toftegaard. 2019. “A Cloud Detection Algorithm for Satellite Imagery Based on Deep Learning.” Remote Sensing of Environment 229: 247–259. doi:10.1016/j.rse.2019.03.039.
  • Johansen, K., O. Lopez, Y. H. Tu, T. Li, and M. F. McCabe. 2021. “Center Pivot Field Delineation and Mapping: A Satellite-Driven Object-Based Image Analysis Approach for National Scale Accounting.” Isprs Journal of Photogrammetry and Remote Sensing 175: 1–19.
  • Knox, J. W., M. G. Kay, and E. K. Weatherhead. 2012. “Water Regulation, Crop Production, and Agricultural Water Management—understanding Farmer Perspectives on Irrigation Efficiency.” Agricultural Water Management 108: 3–8. doi:10.1016/j.agwat.2011.06.007.
  • Li, Y., W. Chen, Y. Zhang, C. Tao, R. Xiao, and Y. Tan. 2020. “Accurate Cloud Detection in High-Resolution Remote Sensing Imagery by Weakly Supervised Deep Learning.” Remote Sensing of Environment 250: 112045. doi:10.1016/j.rse.2020.112045.
  • Li, Z., and J. Dong. 2022. “A Framework Integrating DeeplabV3+, Transfer Learning, Active Learning, and Incremental Learning for Mapping Building Footprints.” Remote Sensing 14 (19): 4738. doi:10.3390/rs14194738.
  • Li, T., K. Johansen, and M. F. McCabe. 2022. “A Machine Learning Approach for Identifying and Delineating Agricultural Fields and Their Multi-Temporal Dynamics Using Three Decades of Landsat Data.” Isprs Journal of Photogrammetry and Remote Sensing 186: 83–101. doi:10.1016/j.isprsjprs.2022.02.002.
  • Li, Z., H. Shen, Q. Cheng, Y. Liu, S. You, and Z. He. 2019. “Deep Learning Based Cloud Detection for Medium and High Resolution Remote Sensing Images of Different Sensors.” Isprs Journal of Photogrammetry and Remote Sensing 150: 197–212. doi:10.1016/j.isprsjprs.2019.02.017.
  • Liu, M., F. Bolin, D. Fan, P. Zuo, S. Xie, H. Hongchang, L. Liu, L. Huang, E. Gao, and M. Zhao. 2021. “Study on Transfer Learning Ability for Classifying Marsh Vegetation with Multi-Sensor Images Using DeepLabv3+ and HRNet Deep Learning Algorithms.” International Journal of Applied Earth Observation and Geoinformation 103: 102531. doi:10.1016/j.jag.2021.102531.
  • Liu, C., Q. Zhang, S. Tao, J. Qi, M. Ding, Q. Guan, B. Wu, et al. 2020. “A New Framework to Map Fine Resolution Cropping Intensity Across the Globe: Algorithm, Validation, and Implication.” Remote Sensing of Environment 251: 112095. doi:10.1016/j.rse.2020.112095.
  • Long, J., E. Shelhamer, and T. Darrell. 2015. “Fully Convolutional Networks for Semantic Segmentation.” Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, Massachusetts.
  • Ma, Y., S. Liu, L. Song, Z. Xu, Y. Liu, T. Xu, and Z. Zhu. 2018. “Estimation of Daily Evapotranspiration and Irrigation Water Efficiency at a Landsat-Like Scale for an Arid Irrigation Area Using Multi-Source Remote Sensing Data.” Remote sensing of environment 216: 715–734.
  • Mummadi, C. K., R. Subramaniam, R. Hutmacher, J. Vitay, V. Fischer, and J. Hendrik Metzen. 2021. “Does Enhanced Shape Bias Improve Neural Network Robustness to Common Corruptions?” International Conference on Learning Representations (ICLR), New Orleans, Louisiana.
  • Ozdogan, M., and G. Gutman. 2008. “A New Methodology to Map Irrigated Areas Using Multi-Temporal MODIS and Ancillary Data: An Application Example in the Continental US.” Remote Sensing of Environment 112 (9): 3520–3537. doi:10.1016/j.rse.2008.04.010.
  • Ozdogan, M., Y. Yang, G. Allez, and C. Cervantes. 2010. “Remote Sensing of Irrigated Agriculture: Opportunities and Challenges.” Remote Sensing 2 (9): 2274–2304. doi:10.3390/rs2092274.
  • Pauls, A., and J. A. Yoder. 2018. Determining Optimum Drop-Out Rate for Neural Networks. Duluth, Minnesota.
  • Perez, L., and J. Wang. 2017. “The Effectiveness of Data Augmentation in Image Classification Using Deep Learning.“ Convolutional Neural Networks Vis. Recognit 11: 1–8.
  • Pinheiro, P. O., T. Y. Lin, R. Collobert, and P. Dollár. 2016. “Learning to Refine Object Segments.” Paper presented at the European conference on computer vision, Amsterdam, the Netherlands.
  • Pinto, M. M., R. Libonati, R. M. Trigo, I. F. Trigo, and C. C. DaCamara. 2020. “A Deep Learning Approach for Mapping and Dating Burned Areas Using Temporal Sequences of Satellite Images.” Isprs Journal of Photogrammetry and Remote Sensing 160: 260–274. doi:10.1016/j.isprsjprs.2019.12.014.
  • Ragettli, S., T. Herberz, and T. Siegfried. 2018. “An Unsupervised Classification Algorithm for Multi-Temporal Irrigated Area Mapping in Central Asia.” Remote Sensing 10 (11): 1823. doi:10.3390/rs10111823.
  • Salmon, J. M., M. A. Friedl, S. Frolking, D. Wisser, and E. M. Douglas. 2015. “Global Rain-Fed, Irrigated, and Paddy Croplands: A New High Resolution Map Derived from Remote Sensing, Crop Inventories and Climate Data.” International Journal of Applied Earth Observation and Geoinformation 38: 321–334. doi:10.1016/j.jag.2015.01.014.
  • Saraiva, M., É. Protas, M. Salgado, and C. Souza. 2020. “Automatic Mapping of Center Pivot Irrigation Systems from Satellite Images Using Deep Learning.” Remote Sensing 12 (3): 558. doi:10.3390/rs12030558.
  • Scanlon, B. R., C. C. Faunt, L. Longuevergne, R. C. Reedy, W. M. Alley, V. L. McGuire, and P. B. McMahon. 2012. “Groundwater Depletion and Sustainability of Irrigation in the US High Plains and Central Valley.” Proceedings of the National Academy of Sciences 109 (24): 9320–9325.
  • Shi, B., D. Zhang, Q. Dai, Z. Zhu, M. Yadong, and J. Wang. 2020. “Informative Dropout for Robust Representation Learning: A Shape-Bias Perspective.” Paper presented at the International Conference on Machine Learning, Vienna, Austria.
  • Sun, J., F. Darbeha, M. Zaidi, and B. Wang. 2020. “SAUNet: Shape Attentive U-Net for Interpretable Medical Image Segmentation.“ International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, Cham, 2020. Lima, Peru.
  • Takikawa, T., D. Acuna, V. Jampani, and S. Fidler. 2019. “Gated-Scnn: Gated Shape Cnns for Semantic Segmentation.” Paper presented at the Proceedings of the IEEE/CVF International Conference on Computer Vision, Long Beach, California.
  • Tian, F., B. Wu, H. Zeng, X. Zhang, and J. Xu. 2019. “Efficient Identification of Corn Cultivation Area with Multitemporal Synthetic Aperture Radar and Optical Images in the Google Earth Engine Cloud Platform.” Remote Sensing 11 (6): 629. doi:10.3390/rs11060629.
  • USDA. 2021. “Irrigation & Water Use.” Accessed November 10. https://www.ers.usda.gov/topics/farm-practices-management/irrigation-water-use/
  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. “Attention is All You Need.” Paper presented at the Advances in neural information processing systems.
  • Waldner, F., and F. I. Diakogiannis. 2020. “Deep Learning on Edge: Extracting Field Boundaries from Satellite Images with a Convolutional Neural Network.” Remote Sensing of Environment 245: 111741. doi:10.1016/j.rse.2020.111741.
  • Waller, P., and M. Yitayew. 2016. “Center Pivot Irrigation Systems.” In Irrigation and Drainage Engineering, edited by P. Waller and M. Yitayew, 209–228. Cham: Springer International Publishing.
  • Wang, F., M. Jiang, C. Qian, S. Yang, L. Cheng, H. Zhang, X. Wang, and X. Tang. 2017. “Residual Attention Network for Image Classification.” Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, Hawaii.
  • Wang, J., K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, et al. 2020. “Deep High-Resolution Representation Learning for Visual Recognition.” IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (10): 3349–3364.
  • Wang, Y., Z. Zhang, L. Feng, Y. Ma, and Q. Du. 2021. “A New Attention-Based CNN Approach for Crop Mapping Using Time Series Sentinel-2 Images.” Computers and Electronics in Agriculture 184: 106090. doi:10.1016/j.compag.2021.106090.
  • Woo, S., J. Park, J.Y. Lee, and I. So Kweon. 2018. “Cbam: Convolutional Block Attention Module.” Paper presented at the Proceedings of the European conference on computer vision (ECCV), Munich, Germany.
  • Wu, B., F. Tian, M. Zhang, S. Piao, H. Zeng, W. Zhu, J. Liu, A. Elnashar, and Y. Lu. 2022. “Quantifying Global Agricultural Water Appropriation with Data Derived from Earth Observations.” Journal of Cleaner Production 358: 131891. doi:10.1016/j.jclepro.2022.131891.
  • Xie, Y., T. J. Lark, J. F. Brown, and H. K. Gibbs. 2019. “Mapping Irrigated Cropland Extent Across the Conterminous United States at 30 m Resolution Using a Semi-Automatic Training Approach on Google Earth Engine.” Isprs Journal of Photogrammetry and Remote Sensing 155: 136–149. doi:10.1016/j.isprsjprs.2019.07.005.
  • Yi, Y., Z. Zhang, W. Zhang, C. Zhang, L. Weidong, and T. Zhao. 2019. “Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network.” Remote Sensing 11 (15): 1774. doi:10.3390/rs11151774.
  • You, J., L. Xiaocheng, M. Low, D. Lobell, and S. Ermon. 2017. “Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data.” Paper presented at the Thirty-First AAAI conference on artificial intelligence, California, USA.
  • Zhang, C., J. Dong, L. Zuo, and G. Quansheng. 2022. “Tracking Spatiotemporal Dynamics of Irrigated Croplands in China from 2000 to 2019 Through the Synergy of Remote Sensing, Statistics, and Historical Irrigation Datasets.” Agricultural Water Management 263: 107458. doi:10.1016/j.agwat.2022.107458.
  • Zhang, Q., G. Linlin, R. Zhang, G. Isabel Metternicht, D. Zheyuan, J. Kuang, and X. Min. 2021. “Deep-Learning-Based Burned Area Mapping Using the Synergy of Sentinel-1&2 Data.” Remote Sensing of Environment 264: 112575. doi:10.1016/j.rse.2021.112575.
  • Zhang, Z., Q. Liu, and Y. Wang. 2018. “Road Extraction by Deep Residual U-Net.” IEEE Geoscience and Remote Sensing Letters 15 (5): 749–753.
  • Zhang, C., P. Yue, D. Liping, and W. Zhaoyan. 2018. “Automatic Identification of Center Pivot Irrigation Systems from Landsat Images Using Convolutional Neural Networks.” Agriculture 8 (10): 147.
  • Zhao, K., J. Kang, J. Jung, and G. Sohn. 2018. “Building Extraction from Satellite Images Using Mask R-CNN with Building Boundary Regularization.” Paper presented at the CVPR Workshops, Salt Lake City, Utah.
  • Zheng, J., F. Haohuan, L. Weijia, W. Wenzhao, Y. Zhao, R. Dong, and L. Yu. 2020. “Cross-Regional Oil Palm Tree Counting and Detection via a Multi-Level Attention Domain Adaptation Network.” Isprs Journal of Photogrammetry and Remote Sensing 167: 154–177. doi:10.1016/j.isprsjprs.2020.07.002.
  • Zhou, M., H. Sui, S. Chen, J. Wang, and X. Chen. 2020. “BT-RoadNet: A Boundary and Topologically-Aware Neural Network for Road Extraction from High-Resolution Remote Sensing Imagery.” Isprs Journal of Photogrammetry and Remote Sensing 168: 288–306. doi:10.1016/j.isprsjprs.2020.08.019.