412
Views
0
CrossRef citations to date
0
Altmetric
Research Article

MFFNet: a building change detection method based on fusion of spectral and geometric information

, , , , , , , , & show all
Article: 2322053 | Received 21 Nov 2023, Accepted 16 Feb 2024, Published online: 14 Mar 2024

Abstract

Accurate detection and extraction of changes in buildings heights is important in monitoring construction (both legal and illegal) and assessing disasters. It is also important information for updating real 3D scenes. However, when using remote sensing images, shadows, vegetation and objects with similar spectral and morphological characteristics as buildings can cause false detections, omissions and incomplete patch edges. To address this issue, we develop the multiscale feature fusion network for dual-modal data (MFFNet), which has two main aspects: (1) The multi-dual-modal feature fusion module detects changes in features with similar spectral and morphological characteristics as buildings. This mitigates false detections by making the model more aware of areas where the elevation has changed over time. (2) Because building extraction is affected by shadows and vegetation, we designed a multiscale feature shuffle module. It takes multiscale features and establishes relationships between neighbouring pixels using the pixel-shuffle algorithm, then fuses and reorganizes the multiscale features to highlight the relationships between global contexts, thereby mitigating the problem of building occlusion by shadows. Comparative experiments show that MFFNet achieves better results on GF7-CD and 3DCD datasets than other similar methods. The proposed method can more accurately monitor building changes over large areas.

1. Introduction

With the rapid development of cities, there is a need for up-to-date building information. This is crucial for various purposes, including detecting illegal construction (Zhu et al. Citation2022), assessing disasters (Shafique et al. Citation2022), monitoring construction progress and updating real 3D scenes(He and Chen Citation2019). Currently, building change detection (BCD) primarily relies on 2D remote sensing images (RSIs). There are several effective methods; for example, RSICD relies on geographically aligned multitemporal RSIs to detect changes in the ground surface (Singh Citation1989). High-resolution remote sensing imagery (HRRSI) can be used to quickly identify building changes at a large scale; however, it is inadequate for applications that require height change data, such as for detecting illegal private constructions, disaster damage analysis, construction progress assessment and realistic 3D scene updating.

According to the data source, BCD can be based on spectral information or geometric information. Using spectral information for change detection (CD) requires the extraction of features from RSIs and analysis of the differences between images taken at different times. In the early years, due to the limitations in RSI resolution, independent pixels were commonly used as the basic unit. For example, a pixel-based CD algorithm would perform algebraic operations on a pixel-by-pixel basis to obtain a change difference map. Then, a threshold value would be selected to determine the corresponding changes, such as by image differencing (Coppin et al. Citation2004) or regression analysis (Lu et al. Citation2004). With the commercialization of high-resolution and submetre remote sensing satellites, object-oriented analysis techniques have been introduced into the field of RSICD. The basic unit of CD has transitioned from pixels to objects (Chen Z et al. Citation2022). Object-oriented CD primarily utilizes spectral, texture and spatial context information to obtain change information at the object level (Jabari et al. Citation2019). However, CD based on spectral information suffers from the following two problems: (1) Objects that exhibit spectral and morphological characteristics similar to buildings, such as roads, can be erroneously identified as buildings, leading to false detections. When it comes to changes in spectral characteristics, both houses and roads cause changes in surface spectral features. The same applies to morphological characteristics. (2) When shadows and vegetation are present around the buildings in an image, the edges of the extracted patches tend to be incomplete. This is primarily due to noticeable differences in spectral values between areas with shadows and those without, as well as between areas with and without vegetation. These variations, which commonly occur at the edges of buildings, contribute to incomplete and inaccurate edge detection in CD results.

Change detection based on geometric information extracts elevation features from 3D data and analyses their differences across time (Zuxun et al. Citation2022). This can be used for tree growth monitoring, earthquake damage assessment and terrain and building CD. Geometric CD can be based on geometric information alone or a fusion of geometric and spectral information. Geometric-only CD methods, such as pixel-based height-difference methods, utilize a single grid point as a unit for identifying changes. These techniques have various applications, including 3D CD in urban areas (Chaabouni-Chouayakh et al. Citation2010, Citation2011) and building change analysis (Jung Citation2004; Sasagawa et al. Citation2013). They identify potential changes by deriving height difference scores. Geometric information plays a crucial role in CD and enhances its accuracy. However, in many cases, errors can make it difficult to obtain good results using geometric information alone. Therefore, the use of additional spectral information can help to compensate for such errors, further enhancing CD reliability. Pang et al. (Citation2018) proposed a CD method based on a digital surface model (DSMs) and original image. It uses the graph cut optimization algorithm to extract the feature change area, which is then combined with the original image data to exclude the influence of trees. Finally, four building change categories were obtained: new construction, heightening, demolition and lowering. However, the initial results of these methods rely heavily on geometric comparisons and missed detections cannot be retrieved in the subsequent refinement process. In other words, these methods do not effectively combine geometric and spectral information. Qin et al. (Citation2016) proposed an object-based 3D BCD method for multiperiod stereo images. The MeanShift algorithm is applied for segmentation to obtain the objects for each phase of data. This is combined with feature extraction with decision trees and support vector machine (SVM) for supervised classification. Finally, comparative analysis is performed. In this type of method, DSMs are typically integrated as an additional channel in the classification or detection method. The classification methods used include SVM, decision trees and others. In summary, BCD encounters challenges when relying on geometric information: (1) The presence of geometric errors (e.g. example, registration error) reduces accuracy; (2) combining spectral and geometric information is more accurate but is challenging to perform.

Accordingly, we propose a multiscale feature fusion network (MFFNet) for dual-modal data, which includes the dual-path feature extraction network (DPEN), the multi-dual-modal feature fusion module (MDFM) and the multiscale feature shuffle module (MFSM). It conducts CD of height-varying buildings by effectively combining spectral and geometrical information and considering the relationship between global contexts. The main contributions of this work are as follows:

  • Geometric information is used as a crucial complement to spectral information for BCD. To address challenges in BCD results such as false detections, omissions and incomplete edges, we propose a method that combines spectral and geometric information through the extraction of building elevation features.

  • We developed MFFNet to effectively combine dual-modal data. MFFNet consists of three main modules: the dual-path feature extraction network (DPEN), the MDFM and the MFSM. The DPEN is designed to extract features separately from the dual-modal data. Subsequently, the MDFM is utilized to combine the extracted image and elevation features more effectively. Finally, the MFSM is designed to make the model aware of the relationships between global contexts.

  • To validate the reliability of MFFNet, we conducted experiments using two datasets. Additionally, we conducted ablation experiments for each module to verify the reliability and stability of our model.

2. Related work

In recent years, the rapid development of Earth observation technology has provided increasing amounts of high-resolution imagery from various satellite sensors, which has greatly enriched CD data sources (Chen et al. Citation2020). Different data sources contain different features; for example, optical images contain rich spectral, texture and shape features of ground objects, while nonoptical image sources, such as LIDAR or SAR, can provide information based on different ground physical mechanisms and are less affected by weather conditions than optical sensors (Zhang et al. Citation2021). We can categorize CD according to whether it is based on single or multiple data sources.

2.1. Change detection based on a single data source

We introduce CD based on a single data source from two aspects: (1) CD based on geometric information and (2) CD based on spectral information. CD based on geometric information can calculate metrics in two common ways: height difference and Euclidean distance. Turker and Cetinkaya (Citation2005) utilized DSMs generated from pre- and post-earthquake stereo aerial images to detect collapsed buildings. However, pixel-based height-difference methods are prone to false detections due to alignment errors and DSM inaccuracy. To address this issue, researchers have proposed window/object-based methods. For example Tian et al. (Citation2010) used the minimum height difference in a moving window to reduce DSM noise at object boundaries. In their subsequent work, the use of objects obtained from panchromatic images as height-difference units further reduced false detections. Another problem with height-difference methods is that they are sensitive to alignment and matching errors. The more theoretically rigorous surface Euclidean distance methods can compensate for such shortcomings to some extent. Another approach, presented by Waser et al. (Citation2008), applies the Euclidean distance between surfaces to estimate forest volume changes based on DSMs obtained from image matching. This inter-surface Euclidean distance-based method is robust to alignment errors in top-view 3D data and can be applied to full 3D data, although it is computationally complex and time-consuming. Therefore, although using inter-zone surface Euclidean distances is theoretically more rigorous than using height differences, the latter is still the most convenient and effective method.

While different types of images (optical, SAR, multispectral, etc.) are used in CD, most of the deep learning algorithms used for urban CD only use optical images (Hafner et al. Citation2022). Spectral information is now increasingly used in CD. These methods include the decision tree (Ye et al. Citation2018; Xie et al. Citation2019), Random Forest (Désir et al. Citation2012; Bai et al. Citation2018; Feng and Li Citation2019) and SVM methods (Zhigao et al. Citation2006; Bovolo et al. Citation2008; Wei et al. Citation2015). With the rise of deep neural network research, many scholars have applied deep learning methods to CD. Gedara et al. (Citation2022) proposed a method that integrates a hierarchically structured transformer encoder with a multilayer perception (MLP) decoder in a Siamese network architecture. This innovative approach enables efficient processing of multiscale long-range details, facilitating accurate CD. Similarly, Zhang et al. (Citation2023) introduced a unique global-aware Siamese network (GAS-Net) aimed at generating global-aware features for efficient CD. Their approach incorporates an understanding of the relationships between scenes and foregrounds. Hou et al. (Citation2021) developed the high-resolution triplet network (HRTNet) framework, incorporating a dynamic inception module to enhance multiscale feature information. In a separate study, Chen Z et al. (Citation2022) proposed EDGE-Net, a pioneering method that prioritizes boundary accuracy and the integrity of change regions.

Although all these methods have achieved good detection results in specific scenarios, it is difficult to obtain ideal CD results with geometric information alone due to the presence of geometric errors. Meanwhile, spectral information can be affected by changes in features with similar spectral and morphological characteristics, as well as shadows and vegetation around buildings, leading to false detections, omissions and incomplete building edges. Therefore, some scholars have attempted to combine spectral and geometric information to improve CD accuracy.

2.2. Change detection based on multiple data sources

The different data sources that can be used for CD contain different features. This paper focuses on combining optical imagery and DSM for CD. Some methods consider both geometric and spectral information; for example, rule-based categorization (Tian et al. Citation2011, Citation2015; Nebiker et al. Citation2014; Lak et al. Citation2016), SVM (Chaabouni-Chouayakh and Reinartz Citation2011; Malpica et al. Citation2013; Tu et al. Citation2017), the graph cut method (Du et al. Citation2016) and random forest (Chen et al. Citation2016). However, these have strict parameter requirements, and incorrect parameter settings can lead to errors in the CD results. Yang Y et al. (Citation2021) proposed a CD method based on multilevel segmentation of densely matched point clouds extracted from UAV images, achieving multilevel segmentation and change extraction through chromatic heterogeneity. This post-refinement method is relatively flexible and effective, with parameters that are easy to understand and adjust. Tian et al. (Citation2013) fused changes in elevation and spectral information into a modelling framework that only requires a single metric to be adjusted to obtain CD results. Subsequently, Tian et al. (Citation2014) employed Dempster–Shafer fusion theory to extract building changes by combining DSM elevation changes with Kullback–Leibler divergence similarity measure derived from original images. Qin (Citation2014) proposed a CD method based on high-resolution stereo images and the LoD2 model. It utilizes an unsupervised Self-Organizing Map (SOM) to fuse multichannel metrics, including DSM and spectral features, enabling classification of different categories. Another 3D CD method (Pang et al. Citation2019) uses a joint hyper-pixel graph cut optimization approach. This method transforms BCD into a binary classification problem. First, a joint hyper-pixel object is acquired using the SLIC hyper-pixel segmentation method. Then, the hyper-pixel object is utilized as a processing unit to extract multidimensional change features. Finally, the optimal solution is obtained by employing the graph cut optimization framework. Zhang et al. (Citation2021) proposed a CD framework called W-Net, which is capable of handling both homogeneous and heterogeneous remote sensing data for BCD. The bidirectionally symmetric end-to-end network architecture of W-Net supports the input of homogeneous or heterogeneous remote sensing data for 2D or 3D BCD. MAHNet, as proposed by Pan, Li, et al. (Citation2022), combines two data sources, DSM and GF7, to perform 3D CD tasks.

The above methods combine geometric and spectral information in different ways to accomplish CD from RSIs. Yet challenges remain in effectively combining the two types of information; for example, how to effectively combine two types of information with different physical mechanisms is a challenge. MFFNet is designed to optimize the fusion of spectral and geometric information and establish robust associations with global contextual information. By doing so, we can effectively reduce false detections and omissions while improving the completeness of building edges.

3. Methodology

In this section, we introduce the proposed method in three parts: (1) The basic network architecture of MFFNet, (2) the MDFM and (3) the MFSM.

3.1. Description of the MFFNet framework

The network structure of MFFNet is shown in , which is mainly composed of an encoder, the MDFM and a decoder. The encoder uses a dual-path feature-extraction network (DPEN), MDFM fuses image and elevation features at different scales and the decoder uses the MFSM to consider the global contextual relationships.

Figure 1. Schematic diagram of the proposed MFFNet, which is comprised of (left) an encoder consisting of the dual-path feature extraction network (DPEN), (Middle) the MDFM and (right), a decoder consisting of the MFSM.

Figure 1. Schematic diagram of the proposed MFFNet, which is comprised of (left) an encoder consisting of the dual-path feature extraction network (DPEN), (Middle) the MDFM and (right), a decoder consisting of the MFSM.

3.1.1. Encoder

To extract features more fully in the encoding stage and to consider different data structures, we adopt the DPEN to extract the deep-level features of the HRRSI and DSM. The HRRSI can provide building shape, edge and texture details that are richer than those of ordinary optical images. Therefore, feature extraction from HRRSI uses ResNet-34 (He et al. Citation2016), a powerful feature extractor, to mine multilayer features such as spectral, geometric and texture features. It has four layers of residual blocks constituting {R1, R2, R3, R4}. The DSM data structure is relatively simple, covering surface information elevations other than ground-level in a single band. To prevent overfitting (Yang X et al. Citation2021) we use the FCN (Long et al. Citation2015) instead of ResNet-34 for feature extraction. It contains four layers of standard convolution {C1, C2, C3, C4}. The validity of the dual-path extraction network was verified in Subsection 4.6. To optimize the utilization of the extracted features, we pass them through the MDFM. The purpose of doing this is to effectively combine image and elevation features and highlight their differences, thereby mitigating false detections caused by different objects that exhibit the same spectral and morphological characteristics as buildings.

3.1.2. Decoder

The features from the MDFM are sent to the MFSM, which proceeds to fuse and shuffle the features at different scales, enabling the model to capture global contextual relationships. Finally, a convolutional layer and a sigmoid activation function are utilized to complete the CD task.

3.2. The multi-dual-modal feature fusion module

After going through the DPEN, rich image and elevation features are extracted from the image. However, these two feature types are not combined effectively. In MAHNet (Pan, Li, et al. Citation2022), they are combined using channel stacking. This combines all the extracted features, although not all features are helpful for image disparity discrimination (Hu et al. Citation2018). Excessive use of irrelevant features makes model training more difficult (Ronneberger et al. Citation2015). Therefore, it is very important to know how to efficiently utilize the two extracted different feature types in combination. We are inspired by the Convolutional Attention Block (CBAM) (Woo et al. Citation2018) and propose using MDFM to combine the two features more efficiently. MDFM consists of the multiple dual-modal feature-fusion module (DFM), which is described in the following section ().

Figure 2. The DFM architecture incorporates a fusion layer, channel attention module and spatial attention module-m.

Figure 2. The DFM architecture incorporates a fusion layer, channel attention module and spatial attention module-m.

Let xImage and xDSM represent the image and elevation features, respectively. The dual-modality fusion is defined as: (1) F=fFL(xImage,xDSM)(1) (2) F=fCAM(F)F(2) (3) F=fSAMm(F)F(3) where fFL stands for the Fusion Layer, fCAM stands for the Channel Attention function and fSAMm stands for the Spatial Attention-m function. The symbol denotes multiplication by elements, F represents the result of the fusion layer, F denotes the result of CAM and F denotes the final output.

The details of the Fusion Layer are shown in . The fFL is defined as follows: (4) xconcat=xImagexDSM(4) where represents the concatenation operation and xImage and xDSM represent the features on two different branches. (5) fFL(xImage,xDSM)=fDropoutBatchNorm(fDropoutBatchNorm(fRelu(3×3)((fDropoutBathNorm(fRelu(3×3)(xconcat))))(5) where fRelu(3×3) represents performing a 3 × 3 convolution layer and then activating the Relu activation function. fDropoutBathNorm represents a BatchNorm layer followed by a Dropout layer.

Figure 3. Details of the fusion layer.

Figure 3. Details of the fusion layer.

The Channel Attention Module is defined as: (6) fCAM(F)=σ(MLP(Avgpool(F))+MLP(Maxpool(F)))(6) where MLP is the weight shared and σ represents a sigmoid function. Avgpool stands for average pooling function and Maxpool stands for maximum pooling function.

Spatial Attention Module-m(SAM-m). Inspired by CBAM-S (Pan, Cui, et al. Citation2022), we adapted the spatial attention module to make a better combination of the two different types of features. The formula for SAM-m is as follows: (7) fSAMm(F)=fσ(3×3)(fσ(3×3)(fσ(3×3)([Favg;Fmax])))(7) where fσ(3×3) means that a 3 × 3 convolution operation is performed first, then, the sigmoid is activated as the activation function. Favg is the Average pooling function, Favg stands for the maximum Pooling function and [] stands for a concatenation operation.

MDFM to better control the semantic gap and reduce information redundancy, we add DFM to the skip junctions of each layer of the two branching encoders, as shown in . The advantage of this approach is that it allows a better combination of features at the different scales of the two modalities and reduces the effect of irrelevant information. The dimensions of the feature map are: Di (64 × 64 × 64,128 × 32 × 32,256 × 16 × 16,512 × 8 × 8).

Figure 4. Structural diagram of the MDFM.

Figure 4. Structural diagram of the MDFM.

The attentional visualization of Di is shown in , where yellow areas are those that the model has noticed. We can see that each DFM layer makes the model notice different places.

Figure 5. Visual attention maps.

Figure 5. Visual attention maps.

3.3. Multiscale feature shuffle module

The previous section introduced the MDFM, as shown in . After the features pass through the MDFM, the model can determine the differences. Therefore, it is crucial to explore how to fully utilize these features of different scales. However, current upsampling methods do not sufficiently consider the relationships between pixels, which may result in the loss of important information. To fully exploit the image and elevation features, we introduce the pixel-shuffle technique (Shi et al. Citation2016), which preserves more geometric and spectral information, thereby yielding higher quality upsampling results. The algorithmic mechanism of pixel-shuffle is shown in .

Figure 6. Details of the pixel-shuffle operation.

Figure 6. Details of the pixel-shuffle operation.

In feature extraction networks (Simonyan and Zisserman Citation2014; Huang et al. Citation2017), deep convolution can extract high dimensional features; however, due to the lack of target location information, high dimensional features may not provide accurate location information. To solve this problem, the feature pyramid network (FPN) (Lin et al. Citation2017) has been proposed for RSICD. While some other scholars have adopted the multiscale fusion strategy (Peng et al. Citation2019), none of these methods can make the connections between global pixels. To solve this problem, we propose the MFSM method, as shown in . First, multiscale features consider the relationships between neighbouring pixels by using pixel-shuffle. Second, recovered to the original size by transposed convolution of different sizes. Finally, multiscale features are fused and shuffled to capture global contextual relationships. The MFSM formulas are shown below. From left to right, the dimensions of the grey feature maps are {X01RC1×H16×W16,X02RC2×H8×W8,X03RC3×H4×W4,X04RC4×H2×W2}. The left-to-right yellow feature maps are {X11R64×H2×W2,X12R32×H2×W2,X13R16×H2×W2,X14R8×H2×W2}. The green chart sizes, in order from left to right, are {X21R1×H×W,X22R1×H×W,X23R1×H×W,X24R1×H×W}. The dimensions of the resulting graph are XtotalR1×H×W. The constraint requirements for MFSM are shown in EquationEq. (8): (8) { X21:X01P(16)X11T(4)+convX21 X22:X02p(8)X12T(8)+convX22 X23:X03P(4)X13T(16)+convX23 X24:X04P(2)X14T(32)+convX24 Xtotal=conv(n=04X2n)(8) where P() represents the pixel-shuffle operation; ∑ represents the concatenation operation; T() represents the transposed convolution; conv represents 1 × 1 convolution; and X11,X12,X13,X14} represents the feature maps that have been adjusted by the pixel-shuffle algorithm.

Figure 7. Structural diagram of the MFSM.

Figure 7. Structural diagram of the MFSM.

4. Experiments

4.1. Datasets

Validation was carried out on two datasets: (1) GF7-CD, which was produced for this study and (2) 3DCD, which is a publicly available dataset. The two datasets result from different sensors, which is helpful for validating the paradigm ability of the proposed model.

4.1.1. GF7-CD dataset

This dataset comprised RSIs of Jinhua City, Zhejiang Province, China. They cover an area of 47.76 km2 and contain multiple types of features and complex surfaces. After strict pre-processing of the experimental data, a set of BCD datasets containing both HRRSI and DSM was produced. The spatial resolution of the optical image is 0.65 m. The spatial resolution of the DSM extracted from the GF-7 satellite is 1 m. To ensure the synergy of the training data, we resampled the 1m DSM data to the same 0.65 m as the optical image, and performed spatial resampling. Finally, the elevation value distribution of the DSM data was normalized to between 0 and 255. The training set contained 5040 sets of data, the validation set contained 990 sets of data and the test set contained 114 sets of data.

4.1.2. 3DCD dataset

The 3DCD dataset was proposed by Marsocci et al. (Citation2023). According to Qin et al. (Citation2016), this type of dataset can be categorized as an overhead dataset. It contains 472 pairs of images intercepted from optical orthophotographs obtained through two different aerial surveys and 472 pairs of DSMs generated by rasterization of point clouds obtained from two different LIDAR flights. This data only covers the urban centre of the city of Valladolid in Spain, including the surrounding commercial area. In addition, only changes in images of man-made structures, due to events such as the construction and demolition of buildings/roads/bridges, are included in this dataset.

4.2. Evaluation metrics

We used evaluation metrics that are commonly used in binary classification problems: F1-score, Precision (P), Recall (R) and overall accuracy (OA). The calculation formulas are as follows: (9) F1score=2×Precision×RecallPrecision+Recall(9) (10) Precision=TPTP+FP(10) (11) Recall=TPTP+FN(11) (12) OA=TP+TNTP+FP+TN+FN(12) where TP and TN denote the numbers of correctly categorized changed and unchanged pixels, respectively, and FP and FN denote the numbers of incorrectly categorized changed and unchanged pixels.

4.3. Implementation details

The experiments were run on a workstation with a Windows 10 operating system, Intel(R) i7-9850 CPU and NVIDIA Quadro RTX5000 graphics card with 11 GB of memory. We used Pytorch 1.7.1 as the deep learning framework and programmed in Python 3.6.13. For the optimization and loss function, we employed the Adam optimizer and used the binary cross-entropy loss function to update the model parameters. The initial learning rate was set to 1×105. Additionally, we set the batch size to 2, with the program configured to stop when the F1-score of the validation dataset remained stable for 20 cycles.

We resized the starting input image to 256 × 256 pixels. To perform feature extraction, we utilized Resnet-34 and FCN. This allowed us to generate four scaled features: Image (64 × 64 × 64, 128 × 32 × 32, 256 × 16 × 16, 512 × 8 × 8) and DSM (64 × 64 × 64, 128 × 32 × 32, 256 × 16 × 16, 512 × 8 × 8).

4.4. Comparison methods

To validate the effectiveness of the proposed method, we selected several current popular CD networks for comparison: MapsNet, STANet, FC_EFNet, IFNet, SNUNet and BIT. The details are as follows:

  1. MapsNet (Pan, Cui, et al. Citation2022): A multiattention module that provides good results and robustness in complex urban scenarios.

  2. STANet (Chen and Shi Citation2020): This network embeds spatial-temporal attention to better accomplish CD tasks by capturing spatial-temporal dependencies at different scales.

  3. FC_EFNet (Daudt et al. Citation2018): Directly adds before/after images to the channel dimension as inputs. This method is based on an early fusion convolutional network.

  4. IFNet (Zhang et al. Citation2020): This network first extracts highly representative deep features of dual-time images using a full convolutional dual-stream architecture. Then, the extracted deep features are input into a deep supervised disparity discriminant network for change detection.

  5. SNUNet (Fang et al. Citation2022): This network mitigates the loss of deep localization information in the neural network through compact information transfer between the encoder and decoder and between the decoder and decoder for better CD results.

  6. BIT (Chen H et al. Citation2022): This network performs the CD Test better by representing the diachronic image as several tokens and modelling the over-the-air context using a transformer encoder.

4.5. Experimental results

The performance of MFFNet on two datasets was assessed both qualitatively and quantitatively. The areas of change were highlighted in white, while unchanged areas were highlighted in black. The evaluation metrics mentioned in Subsection 2.3 were used to compare MFFNet with six other methods. The quantitative evaluation results for the GF7-CD and 3DCD datasets are shown in and , respectively, with the best results highlighted in bold.

Table 1. Quantitative assessment of different methods on the GF7-CD dataset.

Table 2. Quantitative assessment of different methods on the 3DCD dataset.

4.5.1. GF7-CD dataset

compares the accuracy evaluation results with those of other CD methods. MFFNet is superior to the other methods in almost all evaluation indexes. In terms of F1-Score, our method is 1.32–12.02% better than the other four methods. The MapsNet model achieves the second-highest F1-Score after MFFNet, which can be attributed to it having an effective design for detecting minor changes in complex scenes. STANet achieves the highest Precision of 97.66%, resulting from a higher number of omissions, and it yields a higher F1-Score than the other methods. FC-EFNet shows lower performance across the various evaluation metrics. Although the IFNet and SNUNet models slightly outperform our method in terms of Precision and Recall, their F1-Scores are significantly lower. The experimental results demonstrate that MFFNet effectively reduces model leakages and misdetections. The experimental results are visualized in , where regions with significant improvement are highlighted with dashed boxes. MapsNet, FC_EFNet, IFNet, SNUNet and BIT produce a higher number of misdetections, as shown in the dashed box regions shown in . The figure clearly illustrates the significant improvements of MFFNet compared to other CD methods in terms of reducing false detections and omissions, while also achieving more complete edges. The traditional method misclassifies playgrounds and flower beds as building changes occur, due to its exclusive focus on image features and neglect of crucial elevation features, leading to the generation of false detections.

Figure 8. Visualization of building change detection results on the GF7-CD dataset.

Figure 8. Visualization of building change detection results on the GF7-CD dataset.

4.5.2. 3DCD dataset

presents the quantitative evaluation results. MFFNet achieves an OA of 98.59%, Precision of 86.05%, Recall of 72.50% and F1-Score of 78.70%, thereby outperforming the other methods across all evaluation metrics. The visualization results are shown in . In , the middle part of the bridge is similar to the surface spectral information, so there are false detections by all methods that do not consider elevation information. In , due to the change in the surface spectral information around the building, there is a false detection by the traditional method, while the edges of the building extracted by MFFNet (with consideration of elevation information) are complete. In , spectral differences due to illumination and thus misdetection occur in all other methods that do not consider elevation information. In the same building shown in the dashed box of , the problem of pseudo-variation due to differences in spectral information leads to false detections in all six remaining methods.

Figure 9. Visualization of building change detection results on the 3DCD dataset.

Figure 9. Visualization of building change detection results on the 3DCD dataset.

Overall, in terms of quantitative results, MFFNet shows the best performance on both datasets, with significant reductions in false detections and omissions. This is mainly due to the fact that the HRRSI provides rich shape, edge and texture details of buildings combined with the surface elevation information provided by DSM. The dual-path extraction network module proposed in this paper can extract image features and elevation features well, which are then effectively combined by the DFM. Then, the MFSM considers the relationships with the global context to better accomplish the task of BCD.

4.6. Dual-path feature-extraction network comparison experiment

This study introduces the DPEN for feature extraction of optical images and DSM. We designed specialized encoders to accommodate the distinct data structures. When dealing with input data from two different structures, the dual- path feature- extraction network is more efficient in accomplishing complex feature learning compared to the siamese structure feature extraction network (SSEN). To identify the most suitable DPEN for both optical image and DSM data structures, we performed comparative experiments on the GF7-CD dataset using various feature extraction networks.

4.6.1. Siamese structure feature extraction network

The proposed network consists of two identical branches, each dedicated to processing dual time-phase optical images and DSMs. Both branches utilize identical feature-extraction networks and share weights to ensure consistent feature extraction for both types of data.

4.6.2. Dual-path feature-extraction network

This network is comprised of two distinct extraction networks. One primarily utilizes ResNet, while the other employs the simpler FCN. To assess the impact of combining different forms of ResNet and FCN, we conducted comparative experiments by pairing ResNet-18 and ResNet-34 with FCN.

As indicated in , SSEN-18 achieves the highest R-value among the four methods, reaching 92.13%. However, the remaining evaluation metrics indicate lower performance. Conversely, DPEN-34 exhibits superior performance in the other three metrics, except for R-value. DPEN-34 utilizes ResNet-34 for optical image extraction and FCN for DSM extraction. As illustrated in , DPEN-34 demonstrates clearer boundaries and minimal discontinuity within the changed region. When comparing the experimental accuracy and visualization results, DPEN-34 proves to be the most suitable method for this paper.

Figure 10. Visualization of the comparative experiments of dual-path extraction networks applied to the GF7-CD dataset.

Figure 10. Visualization of the comparative experiments of dual-path extraction networks applied to the GF7-CD dataset.

Table 3. Results of the dual-path extraction network comparative experiments.

4.7. Ablation experiment

To assess the impacts of DFM, MDFM and MFSM on the overall performance of the proposed MFFNet, we conducted ablation experiments using the GF7-CD dataset. The Baseline configuration involved the removal of the three modules (DFM, MDFM and MFSM) from MFFNet, resulting in a significant decline across nearly all evaluation metrics compared to standard MFFNet. The quantitative results of these ablation experiments are presented in .

Table 4. Accuracy evaluation results of the ablation experiment.

4.7.1. Effectiveness of MFSM

Model A is an extension of the Baseline model, incorporating the MFSM. As depicted in , Model A exhibits improvements in total precision, accuracy, recall and F1-Score of 3.32%, 8.56%, 11.7% and 10.09%, respectively, compared to the Baseline model. Furthermore, as illustrated in , Model A shows significantly enhanced building outlines in comparison to the Baseline model. This improvement can be attributed to the MFSM module, which not only integrates multilevel features but also considers the pixel relationships within the image.

Figure 11. Comparison of ablation experiment results.

Figure 11. Comparison of ablation experiment results.

4.7.2. Effectiveness of MFM

Model B introduces the MDFM module to the Baseline model. As shown in , all quantitative metrics indicate improvements compared to the Baseline model. Additionally, as depicted in , a noticeable reduction in misdetections and omissions can be observed when comparing Model B with the Baseline model. This improvement can be attributed to the MDFM module’s effective management of the semantic gap and reduction of information redundancy.

4.7.3. Effectiveness of DFM

Model C incorporates both the DFM and MFM modules into the Baseline model. As shown in , all quantitative indicators indicate improvement compared to Model B. Analysis of reveals that Model C exhibits enhanced sensitivity to small target changes and fewer false detections and omissions than Model B. This can be attributed to the DFM’s more effective integration of geometric and spectral building information.

As shown in , MFFNet is best in all quantitative indicators except for the recall rate, which is slightly lower than that of Model A. The four sets of ablation experiments show that three modules, DFM, MDFM and MFSM, provide significant improvements of the final CD effect, which proves their validity and reliability.

5. Discussion

In this section, we discuss the role of geometric information in the models presented in this paper. To better understand the significance of geometric information, we conducted comparative experiments on two datasets: 3DCD and GF7-CD. We utilized MFFNet as the baseline network and removed DSMs from the data sources to construct a comparison model D. The results are presented in and .

Table 5. Experimental results comparing Model D and MFFNet on 3DCD.

Table 6. Experimental results comparing Model D and MFFNet on GF7-CD.

The results of the quantitative experiments indicate that the removal of the DSM data source leads to decreases in all quantitative indicators. For instance, F1-Scores are 6.81% and 37.83% lower on the GF7-CD and 3DCD datasets, respectively. This further supports the significant role of geometric information in BCD. To provide a more intuitive demonstration of the impact of geometric information on the models, we conducted attention visualization on their resulting maps. In this visualization, red represents focused attention while blue indicates distracted attention. As shown in , the colour of the bridge in Image2 is similar to that of the road in Image1. Consequently, model D, which does not consider geometric information, only focuses on the white area surrounding the bridge. As a result, a significant amount of leakage occurs. However, MFFNet mitigates this issue by incorporating elevation information—the change in bridge height between DSM1 and DSM2. This allows the model to identify the specific area where the height changes, leading to a reduction in leakage. As shown in , there are noticeable changes in the houses, surrounding roads and vegetation between Image 1 and Image 2. However, model D is adversely affected by these irrelevant changes, as well as shadows, which impairs its ability to concentrate on the houses. In contrast, MFFNet effectively directs attention towards the houses by leveraging the information on changes in house elevation present in DSM1 and DSM2. As shown in , the buildings in Image 2 are obscured by shadows, making their boundaries unclear. However, the buildings in DSM2 are not affected by shadows. Leveraging this insight, MFFNet allocates more attention to capturing the building outlines, leading to sharper and more distinct building outlines. As shown in , both Image 1 and Image 2 contain buildings (with height differences) and flower beds (without height differences). The attention map of model D reveals that it continues to concentrate its attention on the centre and outlines of the flower beds, while the outlines of the buildings remain indistinct. In contrast, MFFNet, which incorporates geometric information, demonstrates superior attention towards building contours, consequently minimizing false detections (such as mistaking flower beds for buildings if they have similar shape characteristics).

Figure 12. Experimental results visualizing the attention of Model D and MFFNet.

Figure 12. Experimental results visualizing the attention of Model D and MFFNet.

The quantitative and qualitative results demonstrate that incorporating geometric information effectively enhances the accuracy of BCD. This approach reduces false detections and omissions caused by features with similar spectral and shape characteristics as buildings, and also mitigates the impacts of shadows and vegetation. Furthermore, it enables the extraction of more comprehensive building edges.

6. Conclusions

BCD based on RSI is an important means of obtaining information on ground changes. This study proposes MFFNet to solve the problems caused by shadows, vegetation and changes in objects with similar spectral and morphological characteristics as buildings. Such problems can easily lead to issues in BCD such as false detections and omissions, and incomplete patch edges. The MFFNet consists of a DPEN, a MDFM and a MFSM. Here, the DPEN allows better extraction of spectral and geometric features, the MDFM enables efficient combination of extracted features and the MFSM considers the relationship between global contexts. This can provide a more efficient combination of spectral and geometric information. The effectiveness of MFFNet in BCD was demonstrated experimentally.

To evaluate the performance of our proposed MFFNet, we conducted a series of experiments, including comparative experiments on two datasets and ablation experiments on important modules. The results show that MFFNet significantly reduces false detections and omissions compared to other current network models. Specifically, it achieves F1-Scores of 91.44% and 78.70% on the GF7-CD and 3DCD datasets. We believe that obtaining additional information about buildings will help to improve the BCD results.

The experiments have the following shortcomings: since MFFNet requires both optical images and DSM, and because popular public datasets usually contain only optical images, validation was only performed on two datasets. Therefore, the model’s paradigm ability remains to be verified. For this reason, we will continue the paradigmatic study of this model.

Authors’ contributions

Conceptualization, Z.H.G. and J.P.P.; methodology, Z.H.G. and J.P.P. and P.X; validation, Z.H.G.; formal analysis P.X. and C.Q and X.X.W and Y.H.Y and Y.W and L.Z; resources, H.J.Z. and H.Z.R.; writing—original draft preparation, Z.H.G.; writing—review and editing, J.P.P. and Z.H.G.; All authors have read and agreed to the published version of the manuscript.

Acknowledgement

We thank the authors of the 3DCD dataset.

Disclosure statement

The authors declare no conflict of interest. And we sincerely appreciate that academic editors and reviewers give their help comments and suggestions.

Additional information

Funding

This work is supported by the Key R&D Program of Ningxia Autonomous Region: Ecological environment monitoring and platform development of ecological barrier protection system for Helan Mountain(2022CMG02014)and Chongqing Graduate Joint Training Base Construction Project (JDLHPYJD2019004) and The Project Supported by the Open Fund of Key Laboratory of Monitoring, Evaluation and Early Warning of Territorial Spatial Planning Implementation, Ministry of Natural Resources(No: LMEE-KF2023001).

References

  • Bai T, Sun K, Deng S, Li D, Li W, Chen Y. 2018. Multi-scale hierarchical sampling change detection using Random Forest for high-resolution satellite imagery. Int J Remote Sens. 39(21):7523–7546. doi:10.1080/01431161.2018.1471542.
  • Bovolo F, Bruzzone L, Marconcini M. 2008. A novel approach to unsupervised change detection based on a semisupervised SVM and a similarity measure. IEEE Trans Geosci Remote Sens. 46(7):2070–2082. doi:10.1109/TGRS.2008.916643.
  • Chaabouni-Chouayakh H, Krauss T, d‘Angelo P, Reinartz P. 2010. 3D change detection inside urban areas using different digital surface models. Germany: Remote Sensing Technology Institute German Aerospace Center (DLR). https://elib.dlr.de/65901/1/ISPRSmy_paper.pdf.
  • Chaabouni-Chouayakh H, d'Angelo P, Krauss T, Reinartz P. 2011. Automatic urban area monitoring using digital surface models and shape features. 2011 Joint Urban Remote Sensing Event, Munich, Germany. 2011, pp. 85–88. IEEE. doi: 10.1109/JURSE.2011.5764725.
  • Chaabouni-Chouayakh H, Reinartz P. 2011. Towards automatic 3D change detection inside urban areas by combining height and shape information. PFG. 2011(4):205–217. doi:10.1127/1432-8364/2011/0083.
  • Chen B, Chen Z, Deng L, Duan Y, Zhou J. 2016. Building change detection with RGB-D map generated from UAV images. Neurocomputing. 208:350–364. doi:10.1016/j.neucom.2015.11.118.
  • Chen H, Qi Z, Shi Z. 2022. Remote sensing image change detection with transformers. IEEE Trans Geosci Remote Sens. 60:1–14. doi:10.1109/TGRS.2021.3095166.
  • Chen H, Shi Z. 2020. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens. 12(10):1662. doi:10.3390/rs12101662.
  • Chen H, Wu C, Du B, Zhang L, Wang L. 2020. Change detection in multisource VHR images via deep siamese convolutional multiple-layers recurrent neural network. IEEE Trans Geosci Remote Sens. 58(4):2848–2864. doi:10.1109/TGRS.2019.2956756.
  • Chen Z, Zhou Y, Wang B, Xu X, He N, Jin S, Jin S. 2022. EGDE-Net: a building change detection method for high-resolution remote sensing imagery based on edge guidance and differential enhancement. ISPRS J Photogramm Remote Sens. 191:203–222. doi:10.1016/j.isprsjprs.2022.07.016.
  • Coppin P, Jonckheere I, Nackaerts K, Muys B, Lambin E. 2004. Review ArticleDigital change detection methods in ecosystem monitoring: a review. Int J Remote Sens. 25(9):1565–1596. doi:10.1080/0143116031000101675.
  • Daudt RC, Le Saux B, Boulch A. 2018. Fully convolutional siamese networks for change detection. 25th IEEE International Conference on Image Processing (ICIP); IEEE.
  • Désir C, Bernard S, Petitjean C, Heutte L. 2012. A random forest based approach for one class classification in medical imaging. Machine Learning in Medical Imaging: Third International Workshop, MLMI 2012, Held in Conjunction with MICCAI 2012, Oct 1; Nice, France: Springer.
  • Du S, Zhang Y, Qin R, Yang Z, Zou Z, Tang Y, Fan C. 2016. Building change detection using old aerial images and new LiDAR data. Remote Sens. 8(12):1030. doi:10.3390/rs8121030.
  • Fang S, Li K, Shao J, Li Z. 2022. SNUNet-CD: a densely connected Siamese network for change detection of VHR images. IEEE Geosci Remote Sens Lett. 19:1–5. doi:10.1109/LGRS.2021.3056416.
  • Feng X, Li P. 2019. Urban built-up area change detection using multi-band temporal texture and one-class random forest. 2019 10th International Workshop on the Analysis of Multitemporal Remote Sensing Images (MultiTemp); IEEE.
  • Gedara W, Bandara C, Patel VM. 2022. A transformer-based siamese network for change detection. IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium; IEEE.
  • Hafner S, Nascetti A, Azizpour H, Ban Y. 2022. Sentinel-1 and sentinel-2 data fusion for urban change detection using a dual stream u-net. IEEE Geosci Remote Sens Lett. 19:1–5. doi:10.1109/LGRS.2021.3119856.
  • He K, Zhang X, Ren S, Sun J. 2016. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  • He Y, Chen S. 2019. Recent advances in 3D data acquisition and processing by time-of-flight camera. IEEE Access. 7:12495–12510. doi:10.1109/ACCESS.2019.2891693.
  • Hou X, Bai Y, Li Y, Shang C, Shen Q. 2021. High-resolution triplet network with dynamic multiscale feature for change detection on satellite images. ISPRS J Photogramm Remote Sens. 177:103–115. doi:10.1016/j.isprsjprs.2021.05.001.
  • Hu J, Shen L, Sun G. 2018. Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  • Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. 2017. Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  • Jabari S, Rezaee M, Fathollahi F, Zhang Y. 2019. Multispectral change detection using multivariate Kullback-Leibler distance. ISPRS J Photogramm Remote Sens. 147:163–177. doi:10.1016/j.isprsjprs.2018.11.014.
  • Jung F. 2004. Detecting building changes from multitemporal aerial stereopairs. ISPRS J Photogramm Remote Sens. 58(3–4):187–201. doi:10.1016/j.isprsjprs.2003.09.005.
  • Lak AM, Zoej MJV, Mokhtarzade M. 2016. A new method for road detection in urban areas using high-resolution satellite images and Lidar data based on fuzzy nearest-neighbor classification and optimal features. Arab J Geosci. 9(5):358. doi:10.1007/s12517-016-2374-1.
  • Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. 2017. Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  • Long J, Shelhamer, E, Darrell, T. 2015. Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  • Lu D, Mausel P, Brondízio E, Moran E. 2004. Change detection techniques. Int J Remote Sens. 25(12):2365–2401. doi:10.1080/0143116031000139863.
  • Malpica JA, Alonso MC, Papí F, Arozarena A, Martínez De Agirre A. 2013. Change detection of buildings from satellite imagery and lidar data. Int J Remote Sens. 34(5):1652–1675. doi:10.1080/01431161.2012.725483.
  • Marsocci V, Coletta V, Ravanelli R, Scardapane S, Crespi M. 2023. Inferring 3D change detection from bitemporal optical images. ISPRS J Photogramm Remote Sens. 196:325–339. doi:10.1016/j.isprsjprs.2022.12.009.
  • Nebiker S, Lack N, Deuber M. 2014. Building change detection from historical aerial photographs using dense image matching and object-based image analysis. Remote Sens. 6(9):8310–8336. doi:10.3390/rs6098310.
  • Pan J, Cui W, An X, Huang X, Zhang H, Zhang S, Zhang R, Li X, Cheng W, Hu Y. 2022. MapsNet: multi-level feature constraint and fusion network for change detection. Int J Appl Earth Obs Geoinf. 108:102676. doi:10.1016/j.jag.2022.102676.
  • Pan J, Li X, Cai Z, Sun B, Cui W. 2022. A self-attentive hybrid coding network for 3D change detection in high-resolution optical stereo images. Remote Sens. 14(9):2046. doi:10.3390/rs14092046.
  • Pang S, Hu X, Cai Z, Gong J, Zhang M. 2018. Building change detection from bi-temporal dense-matching point clouds and aerial images. Sensors. 18(4):966. doi:10.3390/s18040966.
  • Pang S, Hu X, Zhang M, Cai Z, Liu F. 2019. Co-segmentation and superpixel-based graph cuts for building change detection from bi-temporal digital surface models and aerial images. Remote Sens. 11(6):729. doi:10.3390/rs11060729.
  • Peng D, Zhang Y, Guan H. 2019. End-to-end change detection for high resolution satellite images using improved UNet++. Remote Sens. 11(11):1382. doi:10.3390/rs11111382.
  • Qin R. 2014. Change detection on LOD 2 building models with very high resolution spaceborne stereo imagery. ISPRS J Photogramm Remote Sens. 96:179–192. doi:10.1016/j.isprsjprs.2014.07.007.
  • Qin R, Tian J, Reinartz P. 2016. 3D change detection–approaches and applications. ISPRS J Photogramm Remote Sens. 122:41–56. doi:10.1016/j.isprsjprs.2016.09.013.
  • Ronneberger O, Fischer P, Brox T. 2015. U-net: convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Proceedings, Part III 18, Oct 5–9; Munich, Germany: Springer.
  • Sasagawa A, Baltsavias E, Kocaman-Aksakal S, Wegner JD. 2013. Investigation on automatic change detection using pixel-changes and DSM-changes with ALOS-PRISM triplet images. Int Arch Photogramm Remote Sens Spatial Inf Sci. XL-7/W2(7/W2):213–217. doi:10.5194/isprsarchives-XL-7-W2-213-2013.
  • Shafique A, Cao G, Khan Z, Asad M, Aslam M. 2022. Deep learning-based change detection in remote sensing images: a review. Remote Sens. 14(4):871. doi:10.3390/rs14040871.
  • Shi W, Caballero J, Husz’ar F, Totz J, Aitken AP, Bishop R, Rueckert D, Wang Z. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  • Simonyan K, Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556.
  • Singh A. 1989. Review article digital change detection techniques using remotely-sensed data. Int J Remote Sens. 10(6):989–1003. doi:10.1080/01431168908903939.
  • Tian J, Chaabouni-Chouayakh H, Reinartz P, Krauss T, d‘Angelo P. 2010. Automatic 3D change detection based on optical satellite stereo imagery. Int Arch Photogramm Remote Sens Spatial Information Sci. 38(7B):586–591.
  • Tian J, Cui S, Reinartz P. 2014. Building change detection based on satellite stereo imagery and digital surface models. IEEE Trans Geosci Remote Sens. 52(1):406–417. doi:10.1109/TGRS.2013.2240692.
  • Tian J, Chaabouni-Chouayakh H, Reinartz P. 2011. 3D building change detection from high resolution spaceborne stereo imagery. 2011 International Workshop on Multi-Platform/Multi-Sensor Remote Sensing and Mapping; IEEE.
  • Tian J, Nielsen AA, Reinartz P. 2015. Building damage assessment after the earthquake in Haiti using two post-event satellite stereo imagery and DSMs. Int J Image Data Fusion. 6(2):155–169. doi:10.1080/19479832.2014.1001879.
  • Tian J, Reinartz P, d’Angelo P, Ehlers M. 2013. Region-based automatic building and forest change detection on Cartosat-1 stereo imagery. ISPRS J Photogramm Remote Sens. 79:226–239. doi:10.1016/j.isprsjprs.2013.02.017.
  • Tu J, Li D, Feng W, Han Q, Sui H. 2017. Detecting damaged building regions based on semantic scene change from multi-temporal high-resolution remote sensing images. IJGI. 6(5):131. doi:10.3390/ijgi6050131.
  • Turker M, Cetinkaya B. 2005. Automatic detection of earthquake‐damaged buildings using DEMs created from pre‐and post‐earthquake stereo aerial photographs. Int J Remote Sens. 26(4):823–832. doi:10.1080/01431160512331316810.
  • Waser L, Baltsavias E, Ecker K, Eisenbeiss H, Feldmeyer-Christe E, Ginzler C, Küchler M, Zhang L. 2008. Assessing changes of forest area and shrub encroachment in a mire ecosystem using digital surface models and CIR aerial images. Remote Sens Environ. 112(5):1956–1968. doi:10.1016/j.rse.2007.09.015.
  • Wei L, Lu M, Chen X. 2015. Automatic change detection of urban land-cover based on SVM classification. 2015 IEEE international geoscience and remote sensing symposium (IGARSS); IEEE.
  • Woo S, Park J, Lee J-Y,SoKweon I. 2018. CBAM: convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV).
  • Xie Z, Wang M, Han Y, Yang D. 2019. Hierarchical decision tree for change detection using high resolution remote sensing images. Geo-informatics in Sustainable Ecosystem and Society: 6th International Conference, GSES 2018, Sept. 25–26, Handan, China: Springer.
  • Yang X, Li S, Chen Z, Chanussot J, Jia X, Zhang B, Li B, Chen P. 2021. An attention-fused network for semantic segmentation of very-high-resolution remote sensing imagery. ISPRS J Photogramm Remote Sens. 177:238–262. doi:10.1016/j.isprsjprs.2021.05.004.
  • Yang Y, Chen C, Yang B, Hu P, Cui Y. 2021. 3D change detection of buildings based on multi-level segmentation of dense matching point clouds from UAV images. Geomat Inf Sci Wuhan Univ. 46(4):489–496.
  • Ye S, Liu D, Yao X, Tang H, Xiong Q, Zhuo W, Du Z, Huang J, Su W, Shen S, et al. 2018. RDCRMG: a raster dataset clean & reconstitution multi-grid architecture for remote sensing monitoring of vegetation dryness. Remote Sens. 10(9):1376. doi:10.3390/rs10091376.
  • Zhang C, Yue P, Tapete D, Jiang L, Shangguan B, Huang L, Liu G. 2020. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J Photogramm Remote Sens. 166:183–200. doi:10.1016/j.isprsjprs.2020.06.003.
  • Zhang H, Wang M, Wang F, Yang G, Zhang Y, Jia J, Wang S. 2021. A novel squeeze-and-excitation W-Net for 2D and 3D building change detection with multi-source and multi-feature remote sensing data. Remote Sens. 13(3):440. doi:10.3390/rs13030440.
  • Zhang R, Zhang H, Ning X, Huang X, Wang J, Cui W. 2023. Global-aware siamese network for change detection on remote sensing images. ISPRS J Photogramm Remote Sens. 199:61–72. doi:10.1016/j.isprsjprs.2023.04.001.
  • Zhigao Y, Qianqing Q, Qifeng Z. 2006. Change detection in high spatial resolution images based on support vector machine. 2006 IEEE International Symposium on Geoscience and Remote Sensing; IEEE.
  • Zhu Q, Guo X, Deng W, Shi S, Guan Q, Zhong Y, Zhang L, Li D. 2022. Land-use/land-cover change detection based on a Siamese global learning framework for high spatial resolution remote sensing imagery. ISPRS J Photogramm Remote Sens. 184:63–78. doi:10.1016/j.isprsjprs.2021.12.005.
  • Zuxun Z, Huiwei J, Shiyan P, Xiangyun H. 2022. Review and prospect in change detection of multi-temporal remote sensing images. Acta Geod Cartogr Sin. 51(7):1091.