722
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Integrating bi-temporal VHR optical and long-term SAR images for built-up area change detection

, , , &
Article: 2316109 | Received 27 Sep 2023, Accepted 03 Feb 2024, Published online: 20 Feb 2024

ABSTRACT

With the rapid expansion of urbanization, it is imperative to monitor built-up areas changes to promote the sustainable development of cities, aligning with the goals of Sustainable Development Goal 11(SDG 11). Remote sensing big data is valuable for automatically mapping these changes. Bi-temporal very high-resolution (Bi-VHR) optical images have been widely utilized for fine-grained change detection (CD). However, the significant spatiotemporal inconsistency due to imaging conditions and seasonal variations poses challenges for VHR optical CD. Unlike optical images, synthetic aperture radar (SAR) images are unaffected by atmospheric interference and provide robust spatiotemporal features as a supplement. Previous CD algorithms with SAR overlooked the exploration of long-term features of time series. In this study, we propose a novel CD framework combining long-term SAR with Bi-VHR images. It incorporates a spatial-frequency learning module to enhance SAR temporal features and a multisource feature fusion module to adaptively fuse the heterogeneous features. The experiments are conducted on OS-BCD dataset, which is the first dataset specifically designed for this task. The results demonstrate that our proposal outperforms advanced CD methods with F1 score, IoU and OA of 64.99%, 48.13%, and 95.11%, validating the efficacy of our proposal in accurately detecting changes in built-up areas.

This article is part of the following collections:
Big Earth Data in Support of SDG 11: Sustainable Cities and Communities

1. Introduction

With the accelerating process of global urbanization, the expansion of built-up areas poses a threat to arable land quality (Kong Citation2014; Li, Zhang, and Wang Citation2023; Zeren Cetin et al. Citation2023) and leads to environmental degradation (Degerli and Çetin Citation2022a; Liu, Zhang, and Zhou Citation2018). The dynamic changes of built-up areas and the cascading impacts present significant challenges to the achievement of the United Nations Sustainable Development Goals (SDGs) especially SDG11 Sustainable Cities and Communities (Cetin Citation2016; Han et al. Citation2022). Timely and accurate identification of changes in built-up areas is crucial for formulating urban policies and promoting sustainable urban development(Cetin Citation2019; Cetin et al. Citation2021). Remote sensing data, with its large-scale and multi-temporal observation capabilities, enables precise and convenient monitoring of built-up area changes, thereby contributing to the achievement of SDG11 (Derakhshan, Cutter, and Wang Citation2020). In comparison to the time-consuming manual vectorization method, the advancement of change detection (CD) technology (Lv et al. Citation2023) with remote sensing (Degerli and Çetin Citation2022b) provides an opportunity for automated mapping of change areas (Cetin et al. Citation2023; Huang et al. Citation2023; Zeren Cetin et al. Citation2023).

In recent years, many studies focused on conducting CD tasks based on bi-temporal very high-resolution (Bi-VHR) images. These images, with spatial resolutions in the range of meters or even sub-meters, provide detailed spatial information about ground objects (Zhao et al. Citation2023). Deep learning models, known for their robust ability to extract high-level semantic features (Zhang, Zhang, and Du Citation2016), have been widely adopted for Bi-VHR change detection (Chen et al. Citation2020; Zhu et al. Citation2021). Convolutional neural networks (CNNs) combined with Siamese structures have emerged as popular approaches for learning hierarchical representations from Bi-VHR data, with a focus on capturing geometric change characteristics through shared parameters in encoders (Chen et al. Citation2022; Fang, Pan, and Kou Citation2019; Li et al. Citation2023; Song, Hua, and Li Citation2023). Recently, attention-based models have been introduced in change detection (CD) tasks to explore the relevance of object representations (Chen and Shi Citation2020; Liu and Shi Citation2021). The continual improvement of network structures, such as expanding receptive fields (Xia, Xu, and Pu Citation2022) and integrating multi-layer features (Yuan et al. Citation2022; Zhuo et al. Citation2021), has further enhanced the accuracy of fine-grained CD based on VHR images.

However, the effectiveness of CD is still limited by the substantial spatiotemporal inconsistency present in VHR images. Firstly, VHR images exhibit significant spatial inconsistency, resulting in diverse data domains across different regions or time periods (Teng et al. Citation2023). This spatial variability poses challenges to the generalization performance of CD models (Tang, Wang, and Zhang Citation2022). For instance in , factors like the captured angle (Matasci et al. Citation2015), sensor type and solar irradiance (Wang et al. Citation2017) can influence the clarity and visibility of VHR images, leading to considerable variations in the data domain. Secondly, the temporal inconsistency of the background introduces semantic ambiguity (Lv et al. Citation2023) due to seasonal changes and limited shooting time (Li et al. Citation2022). For example, the farmland depicted in undergoes visible changes as the seasons progress (e.g. crop harvesting). Such changes in visual appearance, while maintaining unchanged geographical semantics, increase the likelihood of model misdetection in CD networks. Therefore, it is difficult to break through the uncertainty of data sources mentioned above solely by optimizing the network.

Figure 1. The spatiotemporal inconsistency inherent in VHR images poses a challenge to the robustness improvement of CD models.

Figure 1. The spatiotemporal inconsistency inherent in VHR images poses a challenge to the robustness improvement of CD models.

To address the issue of Bi-VHR data inconsistency in CD, we propose the integration of Synthetic Aperture Radar (SAR) with optical images. SAR images provide long-term scattering features that complement the visual features captured by the Bi-VHR images. The backscatter feature of SAR images characterizes the structural information of ground objects (Saha, Bovolo, and Bruzzone Citation2021), and has been successfully applied in tasks such as built-up mapping (Liang et al. Citation2023). Unlike optical images, SAR data is not affected by atmospheric conditions such as clouds and rain, making it suitable for long-term periodic modeling (Hu and Ban Citation2014; Pirrone, Bovolo, and Bruzzone Citation2020). This allows us to account for heterogeneity caused by image quality and seasonal changes. Previous studies have demonstrated the effectiveness of using bi-temporal SAR data for CD, but only leveraging its spatial features (Fang, Pan, and Kou Citation2019; Geng et al. Citation2019; Jia et al. Citation2015). They ignore the potential of SAR images in capturing phenological information, which has not been extensively explored.

In order to couple VHR and long-term SAR data into built-up CD tasks, two key problems still need to be addressed. Firstly, it is necessary to extract global spatiotemporal information from the SAR time series. CNN typically extract features through sliding convolution operations in the spatial domain (Gao et al. Citation2018), which makes it difficult to capture the global features of both time and space. In contrast, frequency domain features can highlight the key features of long time series by embedding global features into each frequency component (Shi et al. Citation2020). Therefore, we propose the frequency learning to effectively extract spatiotemporal semantic information from the SAR time series. Secondly, the effective integration of VHR and SAR images presents certain challenges. These two types of images differ in terms of resolution, imaging principles, and feature semantics, making their integration a formidable task. In this article, we propose a novel approach that leverages a multi-level attention mechanism to fuse discriminative features extracted from VHR and SAR images.

The main contributions of this article are given as follows.

  1. To address the issue of spatiotemporal inconsistency in Bi-VHR optical CD task, we propose a novel deep learning network combining VHR optical and long-term SAR data specially for CD (HOLS-CDnet).

  2. In HOLS-CDnet, a spatiotemporal-frequency learning module (SFLM) and an attention-based multisource feature fusion module (MFFM) are employed to enhance the long-term feature representation capability and integrate heterogeneous discriminative features adaptively.

  3. We create and publicly release a new CD dataset, which firstly comprises both Bi-VHR and long-term SAR images for CD, named Optical-SAR Built-up Change Detection Dataset (OS-BCD).

The rest of this article is organized as follows: Section 2 introduces some existing techniques in bi-temporal VHR change detection and multi-source remote sensing data change detection. Section 3 describes the basic information of the built-up change detection data set used in our article, which is built by ourselves. Section 4 introduces the structures and evaluation criteria of our proposed network. Section 5 presents the results of experiments we conducted with our network and compares them with some classical methods. Section 6 discusses the effectiveness of SAR + VHR, and Section 7 gives our conclusions.

2. Related work

2.1. Bi-VHR change detection

In recent years, deep learning has been widely used to extract VHR fine-grained change information. Siamese neural networks are popular for their ability to extract features through differences or splicing flexibly (Wang, Tan, et al. Citation2020). In (Liu et al. Citation2018), an unsupervised deep convolutional coupling network is proposed for change detection of VHR heterogeneous images. Two Siamese extensions of fully convolutional networks were first proposed by Daudt, Caye, and Boulch (Citation2018), which achieved the best results on two open change detection datasets. Zhan et al. (Citation2017) introduced a deep Siamese convolutional network for aerial image change detection and the Siamese network learned to extract features directly from the image pairs. To further improve the performance of Siamese neural networks, Chen et al. (Citation2019) incorporated spatial and channel attention mechanisms and denser connections to fuse features. DASNet (Chen, Yuan, et al. Citation2021b), by using the dual-attention mechanism to capture long-range dependencies, could overcome the lack of resistance to pseudo-changes. A densely connected Siamese network for change detection, namely SNUNet-CD (Fang et al. Citation2022), reduced the loss of localization information in neural network deep layers by using compact information communication between encoder and decoder. Du et al. (Citation2019) designed DSFANet, where two symmetric deep networks were used to project the input data of dual-temporal imagery. And a Multi-scale Context Aggregation module (MSCA) in MSCANet was intended to adaptively gather and aggregate contextual data from various crowd sizes (Wang, Zhao, et al. Citation2020b). In a nutshell, these studies focus on highlighting foreground changes and suppressing background information by enhancing model representation capabilities.

2.2. Multisource remote sensing data change detection

Multisource remote sensing data change detection combines multiple types of remote sensing data, such as multispectral images, hyperspectral images, light detection and ranging (LiDAR) data (Lv et al. Citation2022) and radar images, to more accurately detect object changes. The combination of multisource data provides richer information for ground object changes. Some studies have integrated SAR and optical images in CD. DTCDN (Li, Du, et al. Citation2021) is a deep translation-based change detection network, which combines optical and SAR images and uses deep context features to separate unchanged pixels from changed pixels. A Siamese network (Saha et al. Citation2022) is proposed to cope with the absence of post-temporal optical images by processing dual-temporal SAR input and optical pre-temporal input respectively. SAR images are also used to alleviate the problem of optical image cloud coverage (Li et al. Citation2022). In one of the works, optical images and SAR images are used to segment the geographical patches and automatically generate samples (Zhou et al. Citation2021). These studies have preliminarily demonstrated the feasibility of using multisource data for change detection. However, there are few models fusing long-term SAR and VHR optical data to obtain fine-grained CD.

3. OS-BCD: the first built-up CD dataset contained both Bi-VHR optical and long-term SAR images

3.1. Study area

The study area is Guigang City, Guangxi Zhuang Autonomous Region, China, as shown in . It is located between 109°11'56” and 110°39'36"E, 22°39'16” and 24°2'42"N, with a total area of about 1.06 × 104 km2 and a built-up area of more than 80 km2. Due to the latitude and terrain factors, Guangxi is prone to cloudy and rainy weather, making it difficult for optical images to cover all its areas at once (Yuan, Xu, et al. Citation2022). Multiple sources and different capturing times result in significant differences in the VHR optical image data domain of the region.

Figure 2. The study area for collecting images and labeling of OS-BCD.

Figure 2. The study area for collecting images and labeling of OS-BCD.

3.2. OS-BCD dataset

We constructed a dataset including 2188 pairs of Bi-VHR images and long-term SAR data with built-up area changes. The basic information of the dataset is shown in . Long-term SAR data consist of 48 bands, which correspond to two imaging phases. Each phase includes data captured over a period of 12 months, with two polarization modes recorded each month.

Table 1. Basic information of OS-BCD dataset.

The samples are shown in , we can see that the built-up area changes are mainly divided into three categories (a) from vegetation to built-up area (b) from vegetation to bare area (c) from bare area to built-up area. It is obvious that the data domain of Bi-VHR is inconsistent, and some of the built-up area changes are subtle and susceptible to background changes. For example, the first row of is a scenario of arable land harvesting that is easily identified to be a transition from vegetation to bare area. OS-BCD dataset and the code of HOLS-CDnet will be made publicly available.

Figure 3. Dataset sample examples, showing Bi-VHR and bitemporal VH of long-term SAR data and labels. (a) from vegetation to built-up area (b) from vegetation to bare area (c) from bare area to built-up area.

Figure 3. Dataset sample examples, showing Bi-VHR and bitemporal VH of long-term SAR data and labels. (a) from vegetation to built-up area (b) from vegetation to bare area (c) from bare area to built-up area.

4. Methodology

4.1. Overview of HOLS-CDnet

This section describes the novel network (HOLS-CDnet) that integrates Bi-VHR images and long-term SAR data for built-up area change detection. As is illustrated in , the overall framework consists of three modules:

  1. Bi-VHR Siamese CD module (BSCM): we construct a Siamese convolutional neural network to extract multilevel features of Bi-VHR images. By calculating the difference between low-level features and concatenating high-level features, the module gets both fine-grained texture Vs and semantic representatives Vd.

Figure 4. The framework of HOLS-CDnet (a) Bi-VHR Siamese CD module (BSCM) (b) SAR spatiotemporal-frequency learning module (SFLM) (c) multisource feature fusion module (MFFM).

Figure 4. The framework of HOLS-CDnet (a) Bi-VHR Siamese CD module (BSCM) (b) SAR spatiotemporal-frequency learning module (SFLM) (c) multisource feature fusion module (MFFM).

(2)

SAR spatiotemporal-frequency learning module (SFLM): to extract long-term features of long-term SAR data, a spatiotemporal-frequency learning block (SFLB) is introduced to combine spatial and frequency domains. Through global frequency domain operations and local spatial convolution operations, SFLB enhances the ability to distinguish spatiotemporal features of long-term SAR data. We use SFLBs to extract the low-level difference features Ss and high-level difference features Sd.

(3)

Multisource feature fusion module (MFFM): In order to integrate multi-source features from optical and SAR sensors, we utilize three attention modules to fuse multisource difference features. First, the low-high level attention is used to combine Vs and Vd to obtain both texture and semantic representatives of VHR images. Second, multisource fusion attention is utilized respectively on shallow and deep features of multisource data to fuse heterogeneous features. Then, we build a change enhancement module with attention mechanism to adaptively explore effective information from feature stacks. The change enhancement attention module adaptively selects appropriate changing or invariant features from different sensors based on channel self-attention, so as to achieve robust fusion of heterogeneous features. Finally, through a decoder, we get and fuse the ‘VHRoutput’ and ‘Fusionoutput’ to obtain the finally predicted change map.

These three modules will be introduced in detail in the following sections.

4.2. Bi-VHR siamese CD module (BSCM)

Since VHR images have delicate spatial information, it is significant to use an efficient Siamese convolutional network to extract bi-temporal features. Considering the outstanding performance of Resnet (He et al. Citation2016), we use Resnet-50 as the backbone of the Siamese network. Resnet-50 can be divided into one convolutional block and four residual blocks.

First, the Bi-VHR images IT1 and IT2 are fed into ResNet-50 to get two groups of features {fT1,,fT4}, T=T1, T2. In order to enhance changes between low-level features, including fine-grained spatial information, we use the absolute difference to extract the shallow change information. Specifically, we grasp the shallow change feature Vs by applying the difference function |fT12fT22|. In order to explore high-level features, usually containing semantic representatives, we obtain the deep change feature Vd by performing the concatenation function [fT14fT24].

4.3. SAR spatiotemporal-frequency learning module (SFLM)

CNN which learns in spatial domain is widely used in local information extraction, but is not efficient in capturing global information. However, when transitioning an image from spatial domain to frequency domain by Fourier analysis, global information is embedded into each frequency component. Learning in frequency domain means learning a self-adaptive filter to preserve useful information globally, which is cost-efficient for grasping global information (Jia and Yao Citation2023; Zhang et al. Citation2022). Considering the advantage of extracting local and global information simultaneously, some studies employed spatial-frequency joint learning into deep learning (Wu et al. Citation2019; Yang et al. Citation2023; Zhou et al. Citation2022). Since long-term SAR data have enormous spatiotemporal information, it’s critical to effectively extract the periodic and mutative characteristics. In the frequency domain, the periodic feature of the background is considered as a low-frequency signal, while the changing feature is considered as a high-frequency signal. Therefore, temporal global features can be extracted from overall time series by frequency domain feature calculation. Meanwhile, local features in spatial domain can effectively locate changing positions. Therefore, we introduce the SFLB, which is a classical block to joint frequency-spatial learning (Jia and Yao Citation2023) to extract the global and local information efficiently.

First, long-term SAR is first divided into two temporal phases. Then the data from the two temporal phases are processed using SFLB. To enhance change features between time 1 and time 2, the absolute difference between the results of SFLB is computed to get Ss. To combine semantic representatives of deep features, a concatenation is also performed on the results, after which the SFLB is utilized again to obtain Sd.

The process of SFLB is described in detail in . The SFLB is divided into two main parts. One part learns parameters in the spatial domain without Fourier transformation and the data are directly fed into two layers of convolution to get change information in local spatial domain. The other part learns filter parameters in the frequency domain. Initially, the SAR data are transformed from the spatial domain to the frequency domain using the Fast Fourier Transformation (FFT). Then real and imaginary parts are concatenated along the channel dimension. Subsequently, concatenated data are forwarded into two layers of convolution. After that, the data are transformed back to the spatial domain using the Inverse Fast Fourier Transformation (IFFT). Finally, the outputs from the spatial domain and frequency domain are concatenated, and convolution is utilized to compress and restore the results to their original dimensions.

Figure 5. Detailed description of spatiotemporal-frequency learning block (SFLB) in SFLM. FFT and IFFT mean Fast Fourier Transformation and Inverse Fast Fourier Transformation.

Figure 5. Detailed description of spatiotemporal-frequency learning block (SFLB) in SFLM. FFT and IFFT mean Fast Fourier Transformation and Inverse Fast Fourier Transformation.

4.4. Multisource feature fusion module (MFFM)

To obtain discriminated change information from multisource data, MFFM including low-high level attention, multisource fusion attention and change enhancement attention is proposed. First, to adaptively integrate shallow and deep change information of Bi-VHR, low-high level attention is used. Then, aimed at combining multisource features obtained from Bi-VHR and long-term SAR data, we use multisource fusion attention and change enhancement attention to adaptively integrate them. The low-high level attention and change enhancement attention are composed of channel attention (CA) and spatial attention (SA) (Woo et al. Citation2018), while the multisource fusion attention mainly consists of CA. CA and SA can model the correlations between multilevel and multisource features in space and location, by which we can get a more discriminative fusion feature.

To be specific, in low-high level attention, first, we concatenate the Vs and the Vd and forward them into CA and SA to get the channel-spatial-refined difference feature FV′′. These processes can be described as follows: (1) FV=[Vs,Vd](1) (2) FV=CA(FV)FV(2) (3) FV′′=SA(FV)FV(3)

In multisource fusion attention, first, we concatenate Vs and Ss, and feed them them to CA to get the channel-refined shallow feature Fs. Then we perform the same on Vd and Sd to obtain channel-refined deep feature Fd. These processes can be expressed as follows: (4) Fs=[Vs,Ss](4) (5) Fs=CA(Fs)Fs(5) (6) Fd=[Vd,Sd](6)

(7) Fd=CA(Fd)Fd(7) In change enhancement attention, we concatenate Fs and Fd, and perform both CA and SA on them to get the channel-spatial-refined fusion feature Ffuse′′, which can be described as follows: (8) Ffuse=[Fs,Fd](8) (9) Ffuse=CA(Ffuse)Ffuse(9) (10) Ffuse′′=SA(Ffuse)Ffuse(10)

The mechanisms of CA and SA are shown in . The CA is calculated as the following formula: (11) CA=σ(MLPr(AvgPool(F))+MLPr(MaxPool(F)))(11) where FRC×H×W denotes the input feature. AvgPool() and MaxPool() represent the average pooling layer and max pooling layer respectively. Through the two layers, we get two vectors of size C×1×1. MLPr() is a weight sharing multi-layer perceptron with a hidden layer of channel reduction ratio r, after which the two new vectors are added and then activated by a sigmoid function σ to get CA. Similar to CA, SA can be described as the following formula: (12) SA=σ(f(k×k)(AvgPool(F)MaxPool(F)))(12) where fk×k denotes a convolution with a kernel size of k, and the average pooling layer and max pooling layer are utilized on channel dimension to obtain two vectors of size 1×H×W.

Figure 6. Detailed description of channel attention and spatial attention in MFFM.

Figure 6. Detailed description of channel attention and spatial attention in MFFM.

As for the decoder, we introduce a commonly used atrous spatial pyramid pooling (ASPP) in DeepLabV3 (Chen et al. Citation2017), which resamples features at different scales for identifying change areas of arbitrary scales accurately and efficiently. An improved ASPP includes one 1×1 convolution and three 3×3 convolutions with different atrous rates, as well as an image pooling block. Thus, we capture multilevel change features by using ASPP. Then we fuse the five outputs of improved ASPP by concatenation and feed them to a 1×1 convolution and a bilinear upsampling to restore the size of label. Finally, we obtain the OutputV and Outputfusion.

Finally, in order to take advantage of VHR and multisource fusion simultaneously, a hyper-parameter w is introduced to combine the OutputV and Outputfusion. We get the final change map changemapfinal, which can be expressed as the following formula: (13) changemapfinal=w×OutputV+(1w)×Outputfusion(13)

4.5. Loss function

Two loss functions are used as the objective function to optimize the performance of HOLS-CDnet (Li, He, et al. Citation2022). One is the Binary Cross Entropy Loss (BCELoss), which is usually utilized in measuring the error between the target and the input probabilities in binary classification. The other is the dice loss, which proved to be useful in overcoming the class imbalance in CD. They can be described as follows: (14) BCELoss=(ylogy^+(1y)log(1y^))(14) (15) DiceLoss=11mj=1m2i=1Ny^i,jyi,ji=1Ny^i,j+i=1Nyi,j(15) where y^ and y respectively denote the predicted change map and the target, whose number are either 1 or 0. Finally, the objective function is as follows: (16) Loss=λBCELoss+(1λ)DiceLoss(16) In the context of our paper, the symbol λ serves as a balancing factor, regulating the proportion of weight assigned to BCELoss and DiceLoss within the overall loss function. The permissible range for the variable lies between 0 and 1.

5. Experimental result

5.1. Implementation details

We conducted all the experiments on the OS-BCD which was divided into training/validation/test sets by the ratio of 7:1:2 randomly, including 1531/219/438 pairs of Bi-VHR and long-term SAR images. The whole experiments were realized under the Pytorch framework, and used GeForce RTX 2080 GPU to train the proposed network. Besides, we adopted Adam to optimize the model, and the learning rate was initially set to 0.0005 and then decreased by Cosine annealing. When it came to the hyperparameters in attention mechanism, we set the reduction ratio r in CA and kernel size k in SA to 8 and 7 respectively, which is a useful setting proved in (Shi et al. Citation2022). We trained every network for 100 epochs and validated trained models on the validation set, on which the best model was chosen to be utilized on the test set.

5.2. Evaluation metrics

To evaluate the effectiveness of different methods on the test set, we chose some metrics including Precision, Recall, F1, IoU and OA. The metrics are described as follows:

(17) Precision=TPTP+FP(17) (18) Recall=TPTP+FN(18)

(19) F1=2×Precision×RecallPrecision+Recall(19) (20) IoU=TPFN+TP+FP(20) (21) OA=TP+TNTP+TN+FP+FN(21) where TP, FP, TN, and FN denote the numbers of true positive, false positive, true negative, and false negative pixels, respectively.

5.3. Comparative experiments

5.3.1. Comparative CD methods

To show the superiority of the proposed method, several state-of-the-art CD methods were selected to compare with. Given that those methods were designed for VHR images, we only used Bi-VHR as the input, and those methods are listed as follows:

  1. MTCDN (Du et al. Citation2024): it integrates a CD network into a generative adversarial network(GAN) for optical and SAR images. It employs two generators and discriminators for each image domain, utilizing multitask learning to combine image identification and change detection. An end-to-end framework is also used in the prediction and training stage to reduce cost.

  2. FC-Siam-diff (Daudt, Caye, and Boulch Citation2018): it uses a Siamese encoder to extract features from Bi-VHR, and utilizes an absolute difference to combine two temporal features, which are then connected to the decoder.

  3. ISNet (Cheng, Wang, and Han Citation2022): it employs margin maximization to delineate the distinction between changed and unchanged semantics. It also implements a targeted arrangement of attention mechanisms to guide the utilization of channel attention (CA) and spatial attention (SA).

  4. BIT-CD (Chen, Qi, and Shi Citation2021): it introduces a bitemporal image transformer (BIT) designed to proficiently capture contextual information within the spatial–temporal domain.

  5. BiDateNet (Papadomanolaki et al. Citation2019): it is a fully convolutional network based on U-Net. It introduces Long Short-Term Memory (LSTM) convolutional blocks into the skip connection.

  6. DASNet (Chen, Yuan, et al. Citation2021b): it employs a dual attention mechanism to capture long-range dependencies, and a weighted double-margin contrastive loss is proposed to address the problem of imbalanced samples.

  7. SNUNet-CD (Fang et al. Citation2022): it compacts information transmission between encoder and decoder, and between decoder and decoder. In addition, an ensemble Channel Attention Module (ECAM) is proposed for deep supervision.

  8. MSCANet (Liu et al. Citation2022): it utilizes a CNN-based feature extractor to capture hierarchical features, and designs a transformer-based MSCA to encode and aggregate context information. A deep supervision strategy is also used in the network.

5.3.2. Comparative experimental result

To display the superiority of HOLS-CDnet, it was compared with some representative CD methods on the test set. As shown in , HOLS-CDnet achieves the highest F1/IoU/OA of 64.99/48.13/95.11%, which is 2.79/3.00/0.17% higher than the second one, proving the effectiveness of integrating long-term SAR data with Bi-VHR images in built-up CD. Although the precision rate of HOLS-CDnet is comparatively lower than the recall, it is acceptable to recall more changed pixels at the expense of precision. MSCANet combining Transformer with CNN is the second-ranked method, and BiDateNet utilizing CNN and LSTM ranks third. Transformer and LSTM both capture long-range contextual information, which can strengthen feature representations better, improving the detection performance. ISNet has a high recall rate and a low precision rate, showing that margin maximization may capture more false alarms. The mediocre performance of BIT-CD indicates that it is not effective to enhance features extracted by Resnet using BIT. In the baseline Siamese methods, FC-Siam-diff outperforms SNUNet-CD on F1 and IoU. The reason may be that dense skip connections grasp excessive change information, resulting in a high recall rate and a low precision. DASNet, the only method using distance map to identify changes doesn’t exhibit a satisfactory result, demonstrating that distance map is hard to determine the boundary of change accurately. MTCDN, the only method that utilizes both optical and SAR data, is the worst one. The reason may be that it is hard for GAN in MTCDN to generate and fuse cross-resolution optical and SAR features.

Table 2. The result of different CD methods on the test set.

The change detection result on test set is visualized in . Row 1, and 5 contain some built-up area changes, with complex background which can be easily identified as pseudo changes. In these cases, HOLS-CDnet has the benefits of edge continuity and regional integrity, while the boundaries provided by (d)-(h) methods are more broken. Change areas detected by the (i)-(k) methods are not delicate enough, and there are more missed or false detections. Row 2–4 and the last row are simple changes with more obvious built-up area changes. The detection of HOLS-CDnet in these scenarios has robust background suppression, but there are obvious omissions or false alarms in other methods. In general, HOLS-CDnet has a good detection effect in simple change scenarios, less missed and false detection in complex change scenarios.

Figure 7. Visualization of change detection results of each method on the test set (a) image1 (b) image2 (c) label (d) MTCDN (e) FC-Siam-diff (f) ISNet (g) BIT (h) SNUNet-CD (i) MSCANet (j) BiDateNet (k) DASNet (l) HOLS-CDnet.

Figure 7. Visualization of change detection results of each method on the test set (a) image1 (b) image2 (c) label (d) MTCDN (e) FC-Siam-diff (f) ISNet (g) BIT (h) SNUNet-CD (i) MSCANet (j) BiDateNet (k) DASNet (l) HOLS-CDnet.

5.4. Ablation experiments

5.4.1 The effect of SAR, MFFM and SFLB

To verify the effectiveness of the introduction of SAR data, MFFM and SFLB, we design some ablation experiments. In , the ‘VHR’ denotes the network that only contains BSCM without MFFM and SFLM; The ‘SAR’ represents a simple CNN module instead of SFLB to extract long-term SAR features only in spatial domain; The ‘MFFM’ and ‘SFLM’ are modules described in section 4.4 and 4.3 respectively. The network that uses all the four modules is the proposed method, HOLS-CDnet.

Table 3. Ablation experiments on four modules.

The network that incorporates all the modules demonstrates superior performance, as evidenced by the results presented in . Compared with using ‘VHR’ only, the F1 and IoU of HOLS-CDnet are improved by 4.84% and 5.12%, verifying the validity of the whole modules. Though introducing ‘SAR’ directly, the recall rate and precision rate are balanced better. Utilizing ‘MFFM’ to combine ‘VHR’ and ‘SAR’ can further improve the performance. Introducing the ‘SFLM’ in the three modules above obtains the optimal result, indicating that integrating long-term SAR data with VHR images calls for special feature extractor and elaborate fusion module. The combination of MFFM and SFLM is more effective in improving the CD result of ‘VHR’ than simply introducing the long-term SAR data.

As depicted in , the second and last rows illustrate that the utilization of all modules leads to improved regional integrity performance. This observation indicates that the incorporation of MFFM and SFLM modules enhances the recognition capabilities of the baseline network. The ‘VHR’ tends to detect more pseudo changes, and ‘VHR + SAR’ is inclined to omit some marginal changes, while the HOLS-CDnet captures built-up area changes more accurately. This result demonstrates that MFFM and SFLM can combine the advantages of Bi-VHR and long-term SAR data, obtaining more discriminant feature representations.

Figure 8. Visualization of ablation experiments’ result (a) image1 (b) image2 (c) label (d) 'VHR’ (e) 'VHR + SAR’ (f) 'VHR + SAR'+'MFFM’ (g) 'VHR + SAR'+'MFFM'+'SFLM’.

Figure 8. Visualization of ablation experiments’ result (a) image1 (b) image2 (c) label (d) 'VHR’ (e) 'VHR + SAR’ (f) 'VHR + SAR'+'MFFM’ (g) 'VHR + SAR'+'MFFM'+'SFLM’.

5.4.2 The effect of attention and ASPP

To verify the effectiveness of the introduction of Attention and ASPP in MFFM, we conducted some ablation experiments. shows the results of the ablative experiments. It is evident that the combined approach integrating the ASPP and Attention modules yields the best performance. The F1 value reaches 64.99% and the IOU is 48.13%. In comparison to using only the ASPP module, both metrics demonstrate an improvement of 8%. The network without the inclusion of the Attention Module exhibits the poorest results, underscoring the significance of the Attention Module in enhancing network performance.

Table 4. Ablation experiments on Attention and ASPP.

illustrates the specific effects of the ablation experiments. From the second and fourth rows in the figure, it can be observed that the complete MFFM Module can detect more comprehensive changes, with more accurate edge information. In contrast, the changes detected using only the ASPP Module exhibit smoother edges but fail to effectively characterize the correct change regions. Results obtained solely with the Attention Module show more instances of missed detections, particularly evident in the fourth row, emphasizing the improved accuracy in model recognition achieved through the incorporation of the Attention module.

Figure 9. Visualization of ablation experiments’ result. ‘Attention'+'ASPP’ equals to ‘MFFM’ (a) image1 (b) image2 (c) label (d) 'VHR + SAR'+'SFLM'+'Attention’ (e) 'VHR + SAR'+'SFLM'+'ASPP’ (f) 'VHR + SAR'+’SFLM'+'MFFM’.

Figure 9. Visualization of ablation experiments’ result. ‘Attention'+'ASPP’ equals to ‘MFFM’ (a) image1 (b) image2 (c) label (d) 'VHR + SAR'+'SFLM'+'Attention’ (e) 'VHR + SAR'+'SFLM'+'ASPP’ (f) 'VHR + SAR'+’SFLM'+'MFFM’.

6. Discussion

6.1. The effectiveness of SAR combined with VHR in build-up CD

6.1.1. SAR in build-up CD

Some studies have shown that SAR data can be used for change detection (Jia and Yao Citation2023). We design experiments that utilize simple CNN or SFLM to process long-term SAR data. As shown in , although the experiments prove that SFLM can effectively improve the F1/IoU by 3.45/2.53%, neither of the two methods using long-term SAR data alone can achieve the accuracy of VHR images, which may be attributed to the coarse spatial resolution and speckle noises of long-term SAR data, as well as tremendous temporal information.

Table 5. Result of using long-term SAR alone as the input.

6.1.2. ‘VHR + SAR’ in build-up CD

Currently, most networks are designed for Bi-VHR. None of them integrate long-term SAR data with Bi-VHR images for CD. The simplest way to realize it without making any changes to classic CD methods is concatenating long-term SAR data and Bi-VHR images. We design experiments to test the effectiveness of simple concatenation in CD methods and the result is shown in . ‘Concat’ indicates that long-term SAR data and Bi-VHR images are fused by concatenation as the input, and ‘proposed’ represents the proposed method in our study.

Table 6. The result of methods integrating long-term SAR data, ‘↓’ and ‘↑’ denote the decline and rise compared with using Bi-VHR images alone, respectively.

It is obvious that simple concatenation decreases the performance of CD tremendously. The reason might be that since long-term SAR data have numerous channels, simple concatenation enables the network to pay more attention to long-term SAR data, making it difficult to fully utilize the important spatial information of Bi-VHR images. However, the proposed method improves the result on all metrics compared with using Bi-VHR images alone, which proves that using multi branches to process the two modal data is more suitable and MFFM is useful for integrating Bi-VHR images and long-term SAR data.

6.1.3. Bi-SAR or long-term SAR in build-up CD

To further prove the superiority of long-term SAR data, we conducted experiments based on the HOLS-CDnet framework. In these experiments, ‘SAR(1VV, 1VH)’ denotes the use of bi-temporal optical and bi-temporal short-term SAR data, with each short-term SAR time phase comprising one VV and one VH band acquired in December of the corresponding year. In contrast, ‘long-term SAR’ indicates the use of bi-temporal optical and bi-temporal long-term SAR data, where each long-term SAR time phase comprises 24 bands (VV and VH bands for each of the 12 months), totaling 48 bands. The experimental result presented in below clearly demonstrates a significant improvement in F1 and IOU values for the long-term SAR method compared to the bi-temporal SAR approach, supporting the validity of our experimental design.

Table 7. Experimental results of using long-term SAR and short-term SAR.

The change detection result of two methods is visualized in . As clearly evident from the first and last rows, the change areas detected solely using bi-temporal short-term SAR appear noticeably fragmented, and the detected edges lack precision. In comparison, long-term SAR captures more comprehensive change information, suppressing false positives. This result effectively demonstrates the necessity of employing the long-term SAR method.

Figure 10. Visualization of change detection results of using long-term SAR and short-term SAR. (a) image1 (b) image2 (c) label (d) short-term SAR with 1VV, 1VH for each temporal phase (e) long-term SAR.

Figure 10. Visualization of change detection results of using long-term SAR and short-term SAR. (a) image1 (b) image2 (c) label (d) short-term SAR with 1VV, 1VH for each temporal phase (e) long-term SAR.

6.2. Sensitivity of the hyper-parameters w and λ

w is a hyperparameter used to fuse changemapVHR and changemapfuse in equation (13). To get the best w, we set it from 0.1-0.9 with an interval of 0.1. reveals that the optimum overall performance occurs when w is 0.8. When w is less than 0.4, the result becomes better with the increase of w. When the parameter is greater than 0.4, the results fluctuate with the increase of the value.

Table 8. Sensitivity experiment of w.

λ is the balance factor in the Loss function of Equation (16), which is used to adjust the weight between BCELoss and DiceLoss. In order to get the best value for λ, we choose it from 0.1 and 0.9 in steps of 0.1. shows that when λ is 0.5, F1 and IOU have the highest values, reaching 64.99% and 48.13% respectively. When the value of λ is less than 0.4, with the increase of λ the result becomes worse. When λ is greater than 0.5, F1 and IOU fluctuate around 64.5% and 47.5%, respectively.

Table 9. Sensitivity experiment of λ.

6.3. Time costs and number of parameters

In our experiments, we conducted measurements of the Floating Point Operations per Second (FLOPs), the number of parameters, and the prediction time in the dataset. The FLOPs is a metric used to quantify the computational complexity of a neural network model, which represents the total number of floating-point operations, such as additions and multiplications, performed by the model per second. presents the calculated results.

Table 10. Time costs and number of parameters.

Among all the models compared, the value of FLOPs of HOLS-CDnet of 221.67G, which means that our model requires more computing resources. In terms of the number of parameters, our proposed network has far more parameters than all other networks. However, in terms of time consumption, our network only takes 0.02s. Our model has constructed a dual-branch input structure specifically designed for long-term SAR and optical image fusion, which requires significant computational resources. If the networks such as DASnet also adopt a dual branch encoder structure, the parameter quantity is also similar. Meanwhile, although our model has larger parameters, the actual inference time is not inferior to that of small networks.

7. Conclusion

In order to achieve the objectives of SDG11, accurate detection of changes in built-up areas is of vital importance for the sustainable development of cities. In this study, we address the challenge of data domain variations and background heterogeneity in built-up CD by combining long-term SAR with Bi-VHR images. Our proposal integrates multisource data through the use of a novel network called HOLS-CDnet, which employs SFLM to capture global changes from long-term SAR data, and MFFM to effectively integrate features from multiple sources. Experimental results demonstrate that our method outperforms state-of-the-art approaches, achieving the highest F1 score and IoU of 64.99% and 48.13%, respectively. Ablation experiments further validate the contributions of incorporating long-term SAR data, utilizing SFLM, and employing MFFM. These enhancements enable the network to effectively suppress background information and ensure regional integrity in the CD task. Furthermore, we acknowledge that relying solely on long-term SAR data for CD is challenging due to its coarse spatial resolution and speckle noise. Our approach achieves more robust performance by leveraging the complementary features of VHR and SAR data. Moving forward, we aim to further investigate the efficacy of integrating VHR images and SAR data in other change detection tasks. Additionally, we plan to delve deeper into the potential applications of built-up CD to support urban development decisions for SDG11.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The dataset and the code of HOLS-CDnet is available at https://github.com/Lihy256/HOLS-CDnet.

Additional information

Funding

This study was supported in part by the National Key R&D Program of China under Grant 2022YFB3903402, in part by the National Natural Science Foundation of China under Grant 42222106, in part by the National Natural Science Foundation of China under Grant 61976234, and in part by the Fundamental Research Funds for the Central Universities, Sun Yat-sen University under Grant 22lgqb12.

References

  • Cetin, Mehmet. 2016. “Sustainability of Urban Coastal Area Management: A Case Study on Cide.” Journal of Sustainable Forestry 35 (7): 527–541. https://doi.org/10.1080/10549811.2016.1228072.
  • Cetin, Mehmet. 2019. “The Effect of Urban Planning on Urban Formations Determining Bioclimatic Comfort Area’s Effect Using Satellitia Imagines on air Quality: A Case Study of Bursa City.” Air Quality, Atmosphere & Health 12 (10): 1237–1249. https://doi.org/10.1007/s11869-019-00742-4.
  • Cetin, Mehmet, Talha Aksoy, Saye Nihan Cabuk, Muzeyyen Anil Senyel Kurkcuoglu, and Alper Cabuk. 2021. “Employing Remote Sensing Technique to Monitor the Influence of Newly Established Universities in Creating an Urban Development Process on the Respective Cities.” Land use Policy 109: 105705. https://doi.org/10.1016/j.landusepol.2021.105705.
  • Cetin, Mehmet, Hakan Sevik, Ismail Koc, and Ilknur Zeren Cetin. 2023. “The Change in Biocomfort Zones in the Area of Muğla Province in Near Future due to the Global Climate Change Scenarios.” Journal of Thermal Biology 112: 103434. https://doi.org/10.1016/j.jtherbio.2022.103434.
  • Chen, Pan, Cong Li, Bing Zhang, Zhengchao Chen, X. Yang, Kaixuan Lu, and Lina Zhuang. 2022. “A Region-Based Feature Fusion Network for VHR Image Change Detection.” Remote Sensing 14: 5577. https://doi.org/10.3390/rs14215577.
  • Chen, Liang-Chieh, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. “Rethinking Atrous Convolution for Semantic Image Segmentation.” ArXiv, https://doi.org/10.48550/arXiv.1706.05587.
  • Chen, Hao, Zipeng Qi, and Zhenwei Shi. 2021a. “Remote Sensing Image Change Detection With Transformers.” IEEE Transactions on Geoscience and Remote Sensing 60: 1–18. https://doi.org/10.1109/TGRS.2020.3034752.
  • Chen, Hao, and Zhenwei Shi. 2020. “A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image Change Detection.” Remote Sensing 12 (10): 1662. https://doi.org/10.3390/rs12101662.
  • Chen, Hongruixuan, Chen Wu, Bo Du, and Liangpei Zhang. 2019. “Deep Siamese Multi-Scale Convolutional Network for Change Detection in Multi-Temporal VHR Images.” 2019 10th International Workshop on the Analysis of Multitemporal Remote Sensing Images (MultiTemp), 1–4. https://doi.org/10.1109/Multi-Temp.2019.8866947.
  • Chen, Hongruixuan, Chen Wu, Bo Du, Liangpei Zhang, and Le Wang. 2020. “Change Detection in Multisource VHR Images via Deep Siamese Convolutional Multiple-Layers Recurrent Neural Network.” IEEE Transactions on Geoscience and Remote Sensing 58 (4): 2848–2864. https://doi.org/10.1109/TGRS.2019.2956756.
  • Chen, Jie, Ziyang Yuan, Jian Peng, Li Chen, Haozhe Huang, Jiawei Zhu, Yu Liu, and Haifeng Li. 2021b. “DASNet: Dual Attentive Fully Convolutional Siamese Networks for Change Detection in High-Resolution Satellite Images.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14: 1194–1206. https://doi.org/10.1109/JSTARS.2020.3037893.
  • Cheng, Gong, Guangxing Wang, and Junwei Han. 2022. “ISNet: Towards Improving Separability for Remote Sensing Image Change Detection.” IEEE Transactions on Geoscience and Remote Sensing 60: 1–11. https://doi.org/10.1109/tgrs.2022.3174276.
  • Daudt, R., B. Le Saux Caye, and A. Boulch. 2018. “Fully Convolutional Siamese Networks for Change Detection.” 2018 25th IEEE International Conference on Image Processing (ICIP): 4063–4067. https://doi.org/10.1109/ICIP.2018.8451652.
  • Degerli, Burcu, and Mehmet Çetin. 2022a. “Evaluation from Rural to Urban Scale for the Effect of NDVI-NDBI Indices on Land Surface Temperature, in Samsun, Türkiye.” Turkish Journal of Agriculture - Food Science and Technology 10 (10): 2446–2452. https://doi.org/10.24925/turjaf.v10i12.2446-2452.5535.
  • Degerli, Burcu, and Mehmet Çetin. 2022b. “Using the Remote Sensing Method to Simulate the Land Change in the Year 2030.” Turkish Journal of Agriculture - Food Science and Technology 10 (12): 2453. https://doi.org/10.24925/turjaf.v10i12.2453-2466.5555.
  • Derakhshan, S., S. L. Cutter, and C. Wang. 2020. “Remote Sensing Derived Indices for Tracking Urban Land Surface Change in Case of Earthquake Recovery.” Remote Sensing 12 (5): 895. https://doi.org/10.3390/rs12050895.
  • Du, Zhengshun, Xinghua Li, Jianhao Miao, Yanyuan Huang, Huanfeng Shen, and Liangpei Zhang. 2024. “Concatenated Deep-Learning Framework for Multitask Change Detection of Optical and SAR Images.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 17: 719–731. https://doi.org/10.1109/JSTARS.2023.3333959.
  • Du, Bo, Lixiang Ru, Chen Wu, and Liangpei Zhang. 2019. “Unsupervised Deep Slow Feature Analysis for Change Detection in Multi-Temporal Remote Sensing Images.” IEEE Transactions on Geoscience and Remote Sensing 57 (12): 9976–9992. https://doi.org/10.1109/TGRS.2019.2930682.
  • Fang, S., K. Li, J. Shao, and Z. Li. 2022. “SNUNet-CD: A Densely Connected Siamese Network for Change Detection of VHR Images.” IEEE Geoscience and Remote Sensing Letters 19: 1–5. https://doi.org/10.1109/LGRS.2021.3056416.
  • Fang, Bo, Li Pan, and Rong Kou. 2019. “Dual Learning-Based Siamese Framework for Change Detection Using Bi-Temporal VHR Optical Remote Sensing Images.” Remote Sensing 11: 1292. https://doi.org/10.3390/rs11111292.
  • Gao, Feng, Xiao Wang, Junyu Dong, and Shengke Wang. 2018. “Synthetic Aperture Radar Image Change Detection Based on Frequency-Domain Analysis and Random Multigraphs.” Journal of Applied Remote Sensing 12 (1): 1. https://doi.org/10.1117/1.JRS.12.016010.
  • Geng, Jie, Xiaorui Ma, Xiaojun Zhou, and Hongyu Wang. 2019. “Saliency-Guided Deep Neural Networks for SAR Image Change Detection.” IEEE Transactions on Geoscience and Remote Sensing 57: 7365–7377. https://doi.org/10.1109/TGRS.2019.2913095.
  • Han, Liying, Linlin Lu, Junyu Lu, Xintong Liu, Shuangcheng Zhang, Ke Luo, Dan He, Penglong Wang, Huadong Guo, and Qingting Li. 2022. “Assessing Spatiotemporal Changes of SDG Indicators at the Neighborhood Level in Guilin, China: A Geospatial Big Data Approach.” Remote Sensing 14 (19): 4985. https://doi.org/10.3390/rs14194985.
  • He, Kaiming, X. Zhang, Shaoqing Ren, and Jian Sun. 2016. “Deep Residual Learning for Image Recognition.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. https://doi.org/10.1109/cvpr.2016.90.
  • Hu, Hongtao, and Yifang Ban. 2014. “Unsupervised Change Detection in Multitemporal SAR Images Over Large Urban Areas.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7 (8): 3248–3261. https://doi.org/10.1109/JSTARS.2014.2344017.
  • Huang, Feixiong, Junming Xia, Cong Yin, Xiaochun Zhai, Guanglin Yang, Weihua Bai, Yueqiang Sun, et al. 2023. “Spaceborne GNSS Reflectometry With Galileo Signals on FY-3E/GNOS-II: Measurements, Calibration, and Wind Speed Retrieval.” IEEE Geoscience and Remote Sensing Letters 20: 1–5. https://doi.org/10.1109/LGRS.2023.3241358.
  • Jia, Lu, Ming Li, Yan Wu, Peng Zhang, Gaofeng Liu, Hongmeng Chen, and Lin An. 2015. “SAR Image Change Detection Based on Iterative Label-Information Composite Kernel Supervised by Anisotropic Texture.” IEEE Transactions on Geoscience and Remote Sensing 53: 3960–3973. https://doi.org/10.1109/TGRS.2015.2388495.
  • Jia, Shaocheng, and Wei Yao. 2023. “Joint Learning of Frequency and Spatial Domains for Dense Image Prediction.” ISPRS Journal of Photogrammetry and Remote Sensing 195: 14–28. https://doi.org/10.1016/j.isprsjprs.2022.11.001.
  • Kong, Xiangbin. 2014. “China Must Protect High-Quality Arable Land.” Nature 506 (7486): 7. https://doi.org/10.1038/506007a.
  • Li, Xinghua, Zhengshun Du, Yanyuan Huang, and Zhenyu Tan. 2021. “A Deep Translation (GAN) Based Change Detection Network for Optical and SAR Remote Sensing Images.” ISPRS Journal of Photogrammetry and Remote Sensing 179: 14–34. https://doi.org/10.1016/j.isprsjprs.2021.07.007.
  • Li, Xinghua, Meizhen He, Huifang Li, and Huanfeng Shen. 2022. “A Combined Loss-Based Multiscale Fully Convolutional Network for High-Resolution Remote Sensing Image Change Detection.” IEEE Geoscience and Remote Sensing Letters 19: 1–5. https://doi.org/10.1109/LGRS.2021.3098774.
  • Li, Mengmeng, Xuanguang Liu, Xiaoqin Wang, and Pengfeng Xiao. 2023. “Detecting Building Changes Using Multimodal Siamese Multitask Networks From Very-High-Resolution Satellite Images.” IEEE Transactions on Geoscience and Remote Sensing 61: 1–22. https://doi.org/10.1109/TGRS.2023.3290817.
  • Li, Zhongbin B., Yongjun Zhang, and Mengqiu Wang. 2023b. “Solar Energy Projects put Food Security at Risk.” Science 381 (6659): 740–741. https://doi.org/10.1126/science.adj1614.
  • Li, Haoyang, Fangjie Zhu, Xiaoyu Zheng, Mengxi Liu, and Guangzhao Chen. 2022. “MSCDUNet: A Deep Learning Framework for Built-Up Area Change Detection Integrating Multispectral, SAR, and VHR Data.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15: 5163–5176. https://doi.org/10.1109/JSTARS.2022.3181155.
  • Liang, Jieyu, Chao Ren, Yi Li, Weiting Yue, Zhenkui Wei, Xiaohui Song, Xudong Zhang, Anchao Yin, and Xiaoqi Lin. 2023. “Using Enhanced Gap-Filling and Whittaker Smoothing to Reconstruct High Spatiotemporal Resolution NDVI Time Series Based on Landsat 8, Sentinel-2, and MODIS Imagery.” ISPRS International Journal of Geo-Information 12 (6): 214. https://doi.org/10.3390/ijgi12060214.
  • Liu, Mengxi, Zhuoqun Chai, Haojun Deng, and Rong Liu. 2022. “A CNN-Transformer Network With Multiscale Context Aggregation for Fine-Grained Cropland Change Detection.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15: 4297–4306. https://doi.org/10.1109/JSTARS.2022.3177235.
  • Liu, Jia, Maoguo Gong, Kai Qin, and Puzhao Zhang. 2018. “A Deep Convolutional Coupling Network for Change Detection Based on Heterogeneous Optical and Radar Images.” IEEE Transactions on Neural Networks and Learning Systems 29 (3): 545–559. https://doi.org/10.1109/TNNLS.2016.2636227.
  • Liu, Mengxi, and Qian Shi. 2021. “DSAMNet: A Deeply Supervised Attention Metric Based Network for Change Detection of High-Resolution Images.” 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 6159–6162. https://doi.org/10.1109/IGARSS47720.2021.9555146.
  • Liu, Yansui, Ziwen Zhang, and Yang Zhou. 2018b. “Efficiency of Construction Land Allocation in China: An Econometric Analysis of Panel Data.” Land Use Policy 74: 261–272. https://doi.org/10.1016/j.landusepol.2017.03.030.
  • Lv, Zhiyong, Haitao Huang, Xinghua Li, Minghua Zhao, Jón Atli Benediktsson, Weiwei Sun, and Nicola Falco. 2022. “Land Cover Change Detection With Heterogeneous Remote Sensing Images: Review, Progress, and Perspective.” Proceedings of the IEEE 110: 1976–1991. https://doi.org/10.1109/JPROC.2022.3219376.
  • Lv, Zhiyong, PingDong Zhong, Wen Wang, Zhenzhen You, Jón Atli Benediktsson, and Cheng Shi. 2023a. “Novel Piecewise Distance Based on Adaptive Region Key-Points Extraction for LCCD With VHR Remote-Sensing Images.” IEEE Transactions on Geoscience and Remote Sensing 61: 1–9. https://doi.org/10.1109/TGRS.2023.3268038.
  • Lv, Zhiyong, PingDong Zhong, Wen Wang, Zhenzhen You, and Nicola Falco. 2023b. “Multiscale Attention Network Guided With Change Gradient Image for Land Cover Change Detection Using Remote Sensing Images.” IEEE Geoscience and Remote Sensing Letters 20: 1–5. https://doi.org/10.1109/LGRS.2023.3267879.
  • Matasci, Giona, Nathan Longbotham, Fabio Pacifici, Mikhail Kanevski, and Devis Tuia. 2015. “Understanding Angular Effects in VHR Imagery and Their Significance for Urban Land-Cover Model Portability: A Study of two Multi-Angle in-Track Image Sequences.” ISPRS Journal of Photogrammetry and Remote Sensing 107: 99–111. https://doi.org/10.1016/j.isprsjprs.2015.05.004.
  • Papadomanolaki, Maria, Sagar Verma, Maria Vakalopoulou, Siddharth Gupta, and Konstantinos Karantzalos. 2019. “Detecting Urban Changes with Recurrent Neural Networks from Multitemporal Sentinel-2 Data.” IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium. 214–217, https://doi.org/10.1109/IGARSS.2019.8900330.
  • Pirrone, Davide, Francesca Bovolo, and Lorenzo Bruzzone. 2020. “An Approach to Unsupervised Detection of Fully and Partially Destroyed Buildings in Multitemporal VHR SAR Images.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13: 5938–5953. https://doi.org/10.1109/JSTARS.2020.3026838.
  • Saha, Sudipan, Francesca Bovolo, and Lorenzo Bruzzone. 2021. “Building Change Detection in VHR SAR Images via Unsupervised Deep Transcoding.” IEEE Transactions on Geoscience and Remote Sensing 59 (3): 1917–1929. https://doi.org/10.1109/TGRS.2020.3000296.
  • Saha, Sudipan, Muhammad Shahzad, Patrick Ebel, and Xiao Xiang Zhu. 2022. “Supervised Change Detection Using Prechange Optical-SAR and Postchange SAR Data.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15: 8170–8178. https://doi.org/10.1109/JSTARS.2022.3206898.
  • Shi, Q., Mengxi Liu, Shengchen Li, Xiaoping Liu, Fei Wang, and Liangpei Zhang. 2022. “A Deeply Supervised Attention Metric-Based Network and an Open Aerial Image Dataset for Remote Sensing Change Detection.” IEEE Transactions on Geoscience and Remote Sensing 60: 1–16. https://doi.org/10.1109/TGRS.2021.3085870.
  • Shi, Qian, Mengxi Liu, Xiaoping Liu, Penghua Liu, Pengyuan Zhang, Jinxing Yang, and Xia Li. 2020. “Domain Adaption for Fine-Grained Urban Village Extraction from Satellite Images.” IEEE Geoscience and Remote Sensing Letters 17 (8): 1430–1434. https://doi.org/10.1109/LGRS.2019.2947473.
  • Song, Xinyang, Zhen Hua, and Jinjiang Li. 2023. “GMTS: Gnn-Based Multi-Scale Transformer Siamese Network for Remote Sensing Building Change Detection.” International Journal of Digital Earth 16 (1): 1685–1706. https://doi.org/10.1080/17538947.2023.2210311.
  • Tang, Huakang, Honglei Wang, and Xiao-Pei Zhang. 2022. “Multi-class Change Detection of Remote Sensing Images Based on Class Rebalancing.” International Journal of Digital Earth 15 (1): 1377–1394. https://doi.org/10.1080/17538947.2022.2108921.
  • Teng, Yunhe, Shuoxun Liu, Weichao Sun, Huan Yang, Bin Wang, and Jintong Jia. 2023. “A VHR Bi-Temporal Remote-Sensing Image Change Detection Network Based on Swin Transformer.” Remote Sensing 15. https://doi.org/10.3390/rs15102645.
  • Wang, Moyang, Kun Tan, Xiuping Jia, Xue Wang, and Yu Chen. 2020a. “A Deep Siamese Network with Hybrid Convolutional Feature Extraction Module for Change Detection Based on Multi-Sensor Remote Sensing Images.” Remote Sensing 12: 205. https://doi.org/10.3390/rs12020205.
  • Wang, Qiongjie, Li Yan, Q. Yuan, and Zhenling Ma. 2017. “An Automatic Shadow Detection Method for VHR Remote Sensing Orthoimagery.” Remote Sensing 9 (5): 469. https://doi.org/10.3390/rs9050469.
  • Wang, Xin, Yang Zhao, Tangwen Yang, and Qiuqi Ruan. 2020b. “Multi-Scale Context Aggregation Network with Attention-Guided for Crowd Counting.” 2020 15th IEEE International Conference on Signal Processing (ICSP) 1: 240–245. https://doi.org/10.1109/ICSP48669.2020.9321067.
  • Woo, Sanghyun, Jongchan Park, Joon-Young Lee, and In-So Kweon. 2018. “CBAM: Convolutional Block Attention Module.” ArXiv 3–19. https://doi.org/10.1007/978-3-030-01234-2_1.
  • Wu, Xin, Danfeng Hong, Jiaojiao Tian, Jocelyn Chanussot, Wei Li, and Ran Tao. 2019. “ORSIM Detector: A Novel Object Detection Framework in Optical Remote Sensing Imagery Using Spatial-Frequency Channel Features.” IEEE Transactions on Geoscience and Remote Sensing 57: 5146–5158. https://doi.org/10.1109/TGRS.2019.2897139.
  • Xia, Yufa, Xin Xu, and Fangling Pu. 2022. “PCBA-Net: Pyramidal Convolutional Block Attention Network for Synthetic Aperture Radar Image Change Detection.” Remote Sensing 14 (22): 5762. https://doi.org/10.3390/rs14225762.
  • Yang, Yuting, Licheng Jiao, F. Liu, Xu Liu, Lingling Li, Puhua Chen, and Shuyuan Yang. 2023. “An Explainable Spatial–Frequency Multiscale Transformer for Remote Sensing Scene Classification.” IEEE Transactions on Geoscience and Remote Sensing 61: 1–15. https://doi.org/10.1109/TGRS.2023.3265361.
  • Yuan, Chaofeng, Yuelei Xu, Jingjing Yang, Zhaoxiang Zhang, and Qing Zhou. 2022a. “A Pseudoinverse Siamese Convolutional Neural Network of Transformation Invariance Feature Detection and Description for a SLAM System.” Machines 10: 1070. https://doi.org/10.3390/machines10111070.
  • Yuan, Panli, Qingzhan Zhao, Xingbiao Zhao, Xuewen Wang, Xue Long, and Yuchen Zheng. 2022b. “A Transformer-Based Siamese Network and an Open Optical Dataset for Semantic Change Detection of Remote Sensing Images.” International Journal of Digital Earth 15: 1506–1525. https://doi.org/10.1080/17538947.2022.2111470.
  • Zeren Cetin, Ilknur, Tugrul Varol, Halil Baris Ozel, and Hakan Sevik. 2023. “The Effects of Climate on Land use/Cover: A Case Study in Turkey by Using Remote Sensing Data.” Environmental Science and Pollution Research 30 (3): 5688–5699. https://doi.org/10.1007/s11356-022-22566-z.
  • Zhan, Yang, Kun Fu, Menglong Yan, Xian Sun, Hongqi Wang, and Xiaosong Qiu. 2017. “Change Detection Based on Deep Siamese Convolutional Network for Optical Aerial Images.” IEEE Geoscience and Remote Sensing Letters 14 (10): 1845–1849. https://doi.org/10.1109/LGRS.2017.2738149.
  • Zhang, Dafeng, Feiyu Huang, Shizhuo Liu, Xiaobing Wang, and Zhezhu Jin. 2022. “SwinFIR: Revisiting the SwinIR with Fast Fourier Convolution and Improved Training for Image Super-Resolution.” ArXiv, https://doi.org/10.48550/arXiv.2208.11247.
  • Zhang, Liangpei, Lefei Zhang, and Bo Du. 2016. “Deep Learning for Remote Sensing Data: A Technical Tutorial on the State of the Art.” IEEE Geoscience and Remote Sensing Magazine 4 (2): 22–40. https://doi.org/10.1109/MGRS.2016.2540798.
  • Zhao, Xiaoyang, Keyun Zhao, Siyao Li, and Xianghai Wang. 2023. “GeSANet: Geospatial-Awareness Network for VHR Remote Sensing Image Change Detection.” IEEE Transactions on Geoscience and Remote Sensing 61: 1–14. https://doi.org/10.1109/TGRS.2023.3272550.
  • Zhou, Yuan, Yanjie Feng, Shuwei Huo, and Xiaofeng Li. 2022. “Joint Frequency-Spatial Domain Network for Remote Sensing Optical Image Change Detection.” IEEE Transactions on Geoscience and Remote Sensing 60: 1–14. https://doi.org/10.1109/tgrs.2022.3196040.
  • Zhou, Nan, Xiang Li, Zhanfeng Shen, Tianjun Wu, and Jiancheng Luo. 2021. “Geo-Parcel-Based Change Detection Using Optical and SAR Images in Cloudy and Rainy Areas.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14: 1326–1332. https://doi.org/10.1109/JSTARS.2020.3038169.
  • Zhu, Qiqi, Yanan Zhang, Lizeng Wang, Yanfei Zhong, Qingfeng Guan, Xiaoyan Lu, Liangpei Zhang, and Deren Li. 2021. “A Global Context-Aware and Batch-Independent Network for Road Extraction from VHR Satellite Imagery.” ISPRS Journal of Photogrammetry and Remote Sensing 175: 353–365. https://doi.org/10.1016/j.isprsjprs.2021.03.016.
  • Zhuo, L., Bin Liu, Hui Zhang, Shiyu Zhang, and Jiafeng Li. 2021. “MultiRPN-DIDNet: Multiple RPNs and Distance-IoU Discriminative Network for Real-Time UAV Target Tracking.” Remote Sensing 13 (14): 2772. https://doi.org/10.3390/rs13142772.