191
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Global structure graph mapping for multimodal change detection

, , , &
Article: 2347457 | Received 27 Dec 2023, Accepted 19 Apr 2024, Published online: 09 May 2024

ABSTRACT

Multimodal change detection (MCD) combines multiple remote sensing data sources to realize surface change monitoring, which is essential for disaster evaluation and environmental monitoring. However, due to the ‘incomparable’ features in multimodal data, traditional change detection methods for unimodal (homogenous) data no longer apply. To address this issue, this paper proposes a novel MCD method with global structure graph mapping (GSGM) which extracts the ‘comparable’ structural features between multimodal datasets and constructs a global structure graph (GSG) to express the structure information for each of the multi-temporal images, which are then cross-mapped to the other data domain. The change intensity (CI) is determined by measuring the change of GSGs after mapping and the differences between GSGs and mapped GSGs. The forward and backward CI maps (CIMs) are then fused with the latent low-rank representation method (LLRR), and the change map (CM) is obtained by threshold segmentation. Experiments on five multimodal and four unimodal datasets demonstrate the effectiveness and robustness of the proposed method (source code is made available at https://github.com/rshante0426/GSGM).

1. Introduction

1.1. Background

Change detection (CD) detects and analyzes changes on the Earth’s surface by acquiring remote sensing images from different times in the same geographical area (Han et al. Citation2024; Reba and Seto Citation2020). It is widely used in urban studies (Y. Tang and Zhang Citation2017; Y. Tang et al. Citation2024; Y. Chen et al. Citation2022; Y. Chen et al. Citation2022; Y. Chen et al. Citation2024), forest resource monitoring (Wei et al. Citation2023), and disaster warning and rescue (Brunner, Lemoine, and Bruzzone Citation2010; D. Wang et al. Citation2022). Nowadays, the majority of CD techniques are used for unimodal CD (UCD), such as homogenous multispectral, hyperspectral, and synthetic aperture radar (SAR) images (Bovolo and Bruzzone Citation2007; Saha, Bovolo, and Bruzzone Citation2019; H. Tang, Wang, and Zhang Citation2022). These images are acquired from the same sensor and have the same image characteristics, allowing for the extraction of change information through direct comparison. However, in many practical applications, it is difficult to obtain a set of high-quality unimodal images due to the influence of satellite performance, the environment, etc. The surge in remote sensing satellites has facilitated the integration of multimodal satellite data (Fan, Hou, and Shi Citation2021; Rajakumar and Satheeskumaran Citation2022). This integration, which leverages the fusion of imagery from various sensors, has significantly boosted the Earth observation capabilities of remote sensing images (Hong et al. Citation2023). As a result, it has attracted considerable interest within the academic community, highlighting the potential for more comprehensive and detailed insights into our planet's conditions(Hong et al. Citation2023; Hong et al. Citation2024). Moreover, it is difficult for UCD methods to achieve satisfactory CD results when the data comes from different satellites.

Multimodal CD (MCD) is defined by its analysis of pre-event and post-event images that originate from different sensor types or distinct satellites. By harnessing the power of diverse satellite imagery, MCD can combine optical images with SAR images, or compare images from different satellites such as Sentinel-2 and GaoFen-2. The essence of MCD lies in the use of pre-event and post-event data acquired by sensors that vary in nature or platform, which introduces a range of advantages over UCD include: 1) it is capable of concurrent image capture, enabling the fusion of complementary information from multiple sources (e.g. the spectral and textural information from optical images, and the all-weather imaging capabilities of SAR); 2) obtaining pre- and post-event images within a compressed timeframe reduces disaster response latency. As a result, MCD has become a popular research topic. In recent years, it has been explored through various methods, which can be divided into three categories based on their principles: image space transformation, feature space unification, and similarity measure.

1) Image space transformation. To transform multimodal images to each other's image space and extract the change information, most of these methods build the spatial transformation relationship between the images using a statistical model. (Z. Liu et al. Citation2018) constructed the image space transformation relationship between optical and SAR images by manually selecting samples from unchanged regions and obtained results by directly comparing the transformed images with the original images. To reduce the reliance on a priori information, (Luppino et al. Citation2019)) established a multimodal image space transformation relationship by creating pseudo-samples using affinity matrices and four regression models. Additionally, deep learning-based MCD methods have been proposed, such as the coupling translation network (CPTN) (Gong et al. Citation2019) and adversarial cyclic encoders network (ACE) (Luppino et al. Citation2022). However, due to the differences between multimodal images and the reliance on the collection of training samples or pseudo-samples, there are still significant variations between the original and transformed images.

2) Feature space unification. Multimodal images obtained using various sensors possess dissimilar imaging characteristics. Thus, to enable direct comparison, the images can be transformed into a unified feature space. This allows identical geographical objects across multimodal images to share common spectral features. This type of method can be divided into category feature space unification and latent feature space unification. Category feature space unification, also known as image classification-based methods, assigns the same category label to the same image objects, such as hierarchical extreme learning machine classification (HELMC) (Han et al. Citation2021) and multitemporal segmentation and compound classification (MS-CC) area (L. Wan, Xiang, and You Citation2019). However, these methods are limited by the classification accuracy of multimodal images, particularly SAR images affected by coherent speckle noise. Latent feature space unification enables the comparison of multimodal images by projecting them into a unified feature space, and is mainly based on deep learning, such as the symmetric convolutional coupling network (SCCN) (J. Liu et al. Citation2018), approximately symmetrical deep neural network (ASDNN) (Zhao et al. Citation2017), two-stage joint feature learning (TSJFL) (Han, Tang, and Chen Citation2022), iterative feature mapping network (IFMN) (Zhan et al. Citation2018), commonality autoencoder change detection (CACD) (Wu et al. Citation2022), conditional generative adversarial network (CGAN) (Gong et al. Citation2017), logarithmic transformation feature learning (LTFL) (Zhan et al. Citation2018), etc. However, the existence of changes makes it challenging to uniformly match the spectral properties of all image objects in multimodal images.

3) Similarity measure. This type of method distinguishes between changed and unchanged image pixels by measuring the similarity of certain features of a multimodal image, with a lower similarity indicating a higher probability of change. (Mercier, Moser, and Serpico Citation2008) constructed the relationships of unchanged image pixels based on copula theory and extracted change information using the Kullback-Leibler distance. Wan et al. (L. Wan, Zhang, and You Citation2018) detected the changed regions by comparing the ‘ranked histogram’ features of optical and SAR images. (Alberga Citation2009) obtained the changed regions by calculating the similarity of multimodal images using mutual information. Additionally, some researchers have calculated the similarity of the grayscale features of multimodal images by constructing models, such as the multidimensional statistical model (MSM) (Prendes et al. Citation2015), energy-based model (EBM) (Touati and Mignotte Citation2018), convolution model-based mapping (CMBM) (Touati, Mignotte, and Dahmane Citation2019), multimodal change detection Markovian model (M3CD) (Touati, Mignotte, and Dahmane Citation2020), and graph-based fusion (GBF) (Jimenez-Sierra et al. Citation2020). Despite the potential of existing methods to accurately measure change information in complex image change scenes, the construction of appropriate features is often challenging. Furthermore, the similarity measurement between multimodal images is further complicated by the presence of significant noise in SAR images, leading to an increased false detection rate.

1.2. Motivation and contribution

Despite the considerable differences in imaging features between multimodal images, some researchers have proposed that there are common structural features that can be expressed through a graph based on the self-similarity attributes of the image. In Sun, Lei, Li, et al. (Citation2021), a non-local patch-based graph (NLPG) is constructed in two images separately, and the changes in the NLPG are produced by cross-mapping the pre- and post-temporal images. Based on the NLPG, they proposed a novel graph difference metric formula to calculate the divergence between the structure graphs of multimodal images. Additionally, Mignotte (Citation2020) extracted the structural fractal features of the image and mapped them to the domain of the other image, then employed a fractal decoding strategy to obtain the change information and combined it with Markovian segmentation to identify the changed regions. Subsequently, the iterative robust graph and Markovian co-segmentation method (IRG-McS) was proposed (Sun et al. Citation2021), which acquires a graph structure by updating the graph, obtains the optimal difference graph through iteration, and combines it with Markovian segmentation to obtain the change map (CM). Moreover, the sparse-constrained adaptive structure consistency-based method (SCASC) builds a structural consistency image regression model and uses the Markov segmentation model to differentiate changed and unchanged regions in different images (Sun et al. Citation2022). Chen et al. (Citation2022) used graph autoencoders to characterize the structural features of multimodal images and extract the changed regions by comparing the structural relationships across different modalities.

Although these methods demonstrate the efficacy of the structure graph method for MCD, some problems remain: 1) Existing methods, when constructing structural graphs, predominantly concentrate on the average pixel differences between image patches, overlooking the overall similarity of the patches. This focus can amplify noise and outliers, leading to inaccurate similarity measurements. Additionally, these methods often rely on non-local information, which does not capture the full context of the image, including the layout and relationships between objects, resulting in an incomplete representation of image structural features. 2) There is a lack of adequate consideration for structural changes, with a uniform weight applied to all image patches for change measurement. This approach fails to account for the nuanced differences between structural features and their mapped counterparts within the same image domain. Different image patches, representing various land cover types such as buildings, vegetation, and water bodies, respond differently to changes, and their significance varies. A uniform weight does not consider this diversity, which can lead to over-sensitivity or neglect of certain land cover types, ultimately affecting the overall performance of CD. 3) The current focus on local feature fusion in change intensity maps (CIMs) overlooks the importance of global features. While local features are prone to noise and outliers, which can result in false positives, global features offer more stable information that enhances the robustness of CD.

In this paper, we propose a global structure graph mapping (GSGM)-based MCD method that fully utilizes the global structure information of images and improves the accuracy of CD. It segments the image into overlapping patches and forms a global structure graph (GSG) that optimally employs global patches to represent the image's structural features. Vertices in GSG mainly consist of target and vertex patches (for convenience, image patches in GSG besides the target ones are referred to as ‘vertex patches’ in this paper). The proposed GSGM exploits the overall similarity between target and vertex patches to construct a robust structure graph for each of the multimodal images, which are mapped to each other's image domains to explore the change information of multimodal images. Subsequently, a weighted formula is established to calculate the change intensity (CI) of the image, which stems from both the GSGs post-mapping and the disparities between the GSGs and their corresponding mapped versions within the same image domain. Furthermore, GSGM fuses the forward and backward CIMs using the low-rank representation fusion technique to enhance the expression of CI, and the final CM is obtained by threshold segmentation. In summary, the contributions of this paper are as follows:

  1. The similarity at the patch level between image patches is assessed, offering a nuanced understanding of their relationship. A GSG is introduced to encapsulate the image's structural features, providing a holistic view of the image's layout and structure. This approach, which focuses on both local details and global composition, allows for a comprehensive representation of image content. The GSG evaluates both similarities and dissimilarities among vertices, which aids in more accurate change localization.

  2. A novel weighted structural difference formula has been developed to more precisely capture the significance of individual image patches in change measurement. This method assigns variable weights to each patch, enabling a more granular assessment of structural changes. Furthermore, the changes in the GSGs after mapping, as well as discrepancies between the GSGs and mapped GSGs in the same image domain, are considered, enhancing the robustness of the change measurement approach.

  3. The methodology also incorporates low-rank representation to fuse forward and backward CIMs, integrating global features such as the image's overall structure or background patterns with local features that capture detailed information. This integration results in a more robust CIM.

  4. The proposed GSGM is fully unsupervised and does not require any labeled data, and the effectiveness of GSGM is validated by experiments conducted across nine datasets, where it was compared with state-of-the-art methods.

2. Methodology

Consider two co-registered multimodal images X={x(i,j,b)|1iH,1jW,1bBX} and Y={x(i,j,b)|1iH,1jW,1bBY}. Here, H and W denote the length and width of the images, and BX and BY represent the number of bands in the image X and Y respectively. Recognizing that multimodal images X and Y are from distinct data domains, a direct comparison of their spectral characteristics is impractical. However, in regions that remain unchanged, they share consistent structural attributes. To explore these attributes, we utilize GSG encoding techniques, which encode the structural properties of X and Y into their respective graphs, more accurately capturing the global structural features of the images. By creating GSGs that capture the inherent structural similarities and differences between X and Y, we can effectively extract the nuances of changes in multimodal imagery. These structural attributes are manifested in the connectivity patterns between vertices and neighbor vertices within the graph. Changes in a region will consequently modify the connectivity patterns of the corresponding vertices. In constructing the structural graph, each image patch is considered a distinct vertex, encapsulating features such as color and texture that represent specific image areas. The relationships among these vertices unveil the images’ patterns, hierarchies, and spatial arrangements. Comparing the GSGs of X and Y allows us to quantify the changes between the multimodal images. As illustrated in the framework (), the proposed method encompasses three key stages: (1) construction of a GSG using image patches as the fundamental processing units; (2) evaluation of image CI through the forward/backward mapping of the GSG; and (3) generation of a CM through CIMs fusion and threshold segmentation.

Figure 1. Framework of GSGM-based MCD method.

Figure 1. Framework of GSGM-based MCD method.

2.1. GSG construction

Consider a target patch in the image X, which is defined as p(i,j)X={X(ips:i+ps,jps:j+ps,1:BX)}, 1iH,1jW, whose width is ws=2ps+1. To make full use of the structural features of images, a GSG G(i,j) is constructed to represent the global structural features of the images in this paper. Taking the target patch p(i,j)X as an example, its GSG G(i,j)X is defined as follows: (1) G(i,j)X={V(i,j)X,E(i,j)X,w(i,j)X}V(i,j)X={p(i,j)X,p(m,n)X(m,n)Ω(m,n)X,(i,j)(m,n)}|V(i,j)X|=NE(i,j)X={(p(i,j)X,p(m,n)X)|p(i,j)X,p(m,n)XV(i,j)X}wX(p(i,j)X,p(m,n)X)=sim(p(i,j)X,p(m,n)X)(p(i,j)X,p(m,n)X)E(i,j)X(1) where V(i,j)X denotes the vertex of GSG G(i,j)X, and each vertex patch p(m,n)X is connected to target patch p(i,j)X through edge E(i,j)X. In GSG G(i,j)X, the number of vertex patches is N. The similarity between each vertex patch p(m,n)X and target patch p(i,j)X denotes its connection weight w(i,j)Xand sim() represents the similarity calculation operation. Ω(m,n)Xis an index set of vertex patch p(m,n)X arranged in descending order according to the similarity between vertex patch p(m,n)X and target patch p(i,j)X.

Similarly, GSG G(i,j)Y of image Y can also be constructed.

2.2. Patch similarity measure

The high correlation between adjacent pixels in remote sensing images has been widely acknowledged. However, traditional methods for measuring image patch similarity are based on the calculation of the difference between independent pixels, which makes them vulnerable to the influence of noise, thus leading to inaccurate similarity measurements. To address the similarity measurement of pixel difference, this paper introduces the structural similarity index measure (SSIM) (Z. Wang et al. Citation2004), which evaluates image similarity by taking into account image brightness, contrast, and structure. SSIM is defined as follows: (2) SSIM(x,y)=[l(x,y)]α[c(x,y)]β[s(x,y)]γ(2) where (3) l(x,y)=2μxμy+C1μx2+μy2+C1(3) (4) c(x,y)=2σxσy+C2σx2+σy2+C2(4) (5) s(x,y)=σxy+C3σxσy+C3(5) where l(x,y), c(x,y), and s(x,y)represent the brightness comparison function, contrast comparison function, and structure comparison function of image patches x and y respectively; α>0, β>0, and γ>0 are the adjust parameters; μx and μy are the mean values of x and y respectively; σx and σy are the variance of x and y respectively; σxy denotes the covariance of x and y; and C1, C2, and C3 are constant to keep l(x,y), c(x,y), and s(x,y) stable. The value of SSIM ranges from 0 to 1. The greater the value, the more similar the two patches. It is generally posited that l(x,y),c(x,y), and s(x,y) make equal contributions to SSIM, thereby leading to α=β=γ=1 being conventionally set at unity (Z. Wang et al. Citation2004).

2.3. CI measure

GSGs G(i,j)X and G(i,j)Y are constructed in images X and Y respectively, and express the structural features of the images together with the changes between them. However, due to the differences in the imaging characteristics of multimodal images, the changes between GSGs cannot be effectively measured by a direct comparison. To accurately measure the differences in multimodal images and circumvent the imaging discrepancies between them, GSGs G(i,j)X and G(i,j)Y are mapped to image domains Yand X respectively to obtain their maps G(i,j)Xmap and G(i,j)Ymap for the comparison of multimodal images in the same image domain. As an example, GSG G(i,j)X is mapped into image Y (forward mapping), and its mapped GSG G(i,j)Xmap is defined as follows: (6) G(i,j)Xmap={V(i,j)Xmap,E(i,j)Xmap,w(i,j)Xmap}V(i,j)Xmap={p(m~,n~)Y(m~,n~)Ω(m,n)X,(i,j)(m~,n~)}|V(i,j)Xmap|=NE(i,j)Xmap={(p(i,j)Y,p(m~,n~)Y)|p(i,j)Y,p(m~,n~)YV(i,j)Xmap}wXmap(p(i,j)Y,p(m~,n~)Y)=sim(p(i,j)Y,p(m~,n~)Y)(p(i,j)Y,p(m~,n~)Y)E(i,j)Xmap(6) By forward mapping, the connectivity between the vertices of mapped GSG G(i,j)Xmap is consistent with GSG G(i,j)X, but the patches of the mapped GSG G(i,j)Xmap are all represented using the pixel values of image Y. This can indicate that GSG G(i,j)Xmap and GSG G(i,j)Y are in the same image domain, thus the CI of the region where the target patch p(i,j)Y is located can be measured. In this paper, we consider that the structural differences between G(i,j)Xmap and G(i,j)Y are mainly composed of the following two parts:

1) Intrinsic change of GSG G(i,j)Xmap. If the region represented by target patch p(i,j)Y remains unchanged, the internal pattern of the mapped GSG G(i,j)Xmap is stable; that is, the similarity of vertex patches to target patch p(i,j)Y is stable. Otherwise, once the region represented by target patch p(i,j)Y changes, the similarity of vertex patches to target patch p(i,j)Ywould change, and the change information would express the intrinsic change of GSG G(i,j)Xmap. This change is calculated as follows: (7) dif(i,j)Y1=1N((m,n)sΩ(m,n)Y(m~,n~)sΩ(m~,n~)X|sim(p(i,j)Y,p(m,n)sY)sim(p(i,j)Y,p(m~,n~)sY)|)(7) where (m,n)sΩ(m,n)Y denotes the index of vertex patch p(m,n)sY that ranks s-th in similarity to target patch p(i,j)Y in vertex V(i,j)Y, and (m~,n~)sΩ(m,n)X represents the index of vertex patch p(m~,n~)sY that ranks s-th in similarity to target patch p(i,j)X in vertex V(i,j)Xmap. The change in the region represented by target patch p(i,j)Y can be calculated from the difference in the similarity of vertex patches p(m,n)sY and p(m~,n~)sY with target patch p(i,j)Y. For instance, if vertex patch p(m,n)sX in GSG G(i,j)X exhibits a high similarity to target patch p(i,j)X, and the region represented by target patch p(i,j)Y has changed, vertex patch p(m~,n~)sY in GSG G(i,j)Xmap will display a low similarity to target patch p(i,j)Y, resulting in a diminished value for sim(p(i,j)Y,p(m~,n~)sY). Conversely, in GSG G(i,j)Y, vertex patch p(m,n)sY and target patch p(i,j)Y exhibit a high similarity, and sim(p(i,j)Y,p(m,n)sY) has a greater value. Consequently, the disparity between sim(p(i,j)Y,p(m~,n~)sY) and sim(p(i,j)Y,p(m,n)sY) is more noticeable, accentuating the change of the region represented by target patch p(i,j)Y.

It is worth noting that the top and bottom similarity vertex patches are often sensitive to the change in the region represented by the target patch. This is because, once the region represented by target patch p(i,j)X changes, the similarity between the most and least similar vertex patch p(m,n)X in mapped graph G(i,j)Xmap and target patch p(i,j)Y will change with high probability. To further highlight changes, we employ a weighting approach to augment the change in the similarity of vertex patch p(m,n)sY to target patch p(i,j)Y, so (7) can be reformulated as follows: (8) dif(i,j)Y1=1N((m,n)sΩ(m,n)Y(m~,n~)sΩ(m~,n~)X|β(m,n)sY1×sim(p(i,j)Y,p(m,n)sY)β(m~,n~)sY1×sim(p(i,j)Y,p(m~,n~)sY)|)(8) (9) β(m,n)sY1=exp(λ×|sim(p(i,j)Y,p(m,n)sY)1N((m,n)sΩ(m,n)Ysim(p(i,j)Y,p(m,n)sY))|)(9) (10) β(m~,n~)sY2=exp(λ×|sim(p(i,j)Y,p(m~,n~)sY)1N((m~,n~)sΩ(m,n)Xsim(p(i,j)Y,p(m~,n~)sY))|)(10) where β(m,n)sY1 and β(m~,n~)sY2 donate the variation weight of vertex patches p(m,n)sY and p(m~,n~)sY respectively; and λ is the weight coefficient of β(m,n)sY1 and β(m~,n~)sY2.

2) Difference between mapped GSG G(i,j)Xmap and GSG G(i,j)Y. Once a region in the image has changed, the structural features of mapped GSG G(i,j)Xmap and GSG G(i,j)Y will be dissimilar, and the structural difference between mapped GSG G(i,j)Xmap and GSG G(i,j)Y will reflect the change in the region. This structural difference can be calculated as follows: (11) dif(i,j)Y2=exp(λ)1N((m,n)sΩ(m,n)Y(m~,n~)sΩ(m~,n~)X|β(m,n)sY1×sim(p(m,n)sY,p(m~,n~)sY)|)(11) To be more specific, if the region represented by target patch p(i,j)Y remains unchanged, then vertex patches p(m,n)sY and p(m~,n~)sY are likely to retain a high similarity to target patch p(i,j)Y, resulting insim(p(m,n)sY,p(m~,n~)sY) being a large value and dif(i,j)Y2 being a small value. However, if the region represented by target patch p(i,j)Y changes, the similarity of vertex patch p(m~,n~)sY to target patch p(i,j)Y will likely change, causing sim(p(m,n)sY,p(m~,n~)sY) to be a small value and dif(i,j)Y2 to be a large value with a high probability, thus signifying a significant change in the region represented by target patch p(i,j)Y.

The final structural difference dif(i,j)Y of the region represented by target patch p(i,j)Y is the aggregate of the two structural differences mentioned above: (12) dif(i,j)Y=dif(i,j)Y1+dif(i,j)Y2(12)

The mean value of structural difference F(s,t)Y of all encompassed image pixels (s,t),1sH,1tW is taken as the change information of pixels (s, t), and the forward mapping CIM CIMfw is calculated as follows: (13) CIM(s,t)fw=1|F(s,t)Y|dif(i,j)YF(s,t)Ydif(i,j)Y(13) In the same way, the backward mapping CIM CIMbw can be obtained by mapping GSG G(i,j)Y to image domain X.

Setting the target patch's step size to Δp=ps will accelerate the calculation, and the search step size of the vertex patch is set to Δv=αΔv×min(H/2,W/2), where αΔv donates the step size factor which regulates the step size, while is the downward integer operation.

2.4. CIMs fusion

The CI for each image pixel is represented by CIMs CIMfwandCIMbw which are generated from image domains Dfw and Dfw respectively. The change intensities of the same index in CIMs CIMfw and CIMbw can differ because images X and Y are from distinct data modalities. Therefore, in order to obtain robust CIMs, CIMfw and CIMbw must be fused, and the global and local structural features of CIMs CIMfw and CIMbware considered for CIMs fusion.

Inspired by latent low-rank representation (LLRR) (G. Liu and Yan Citation2011), we decompose CIMs CIMfwand CIMbwinto a low-rank part (global structural), significance part (local structural), and image noise part. The CIM decomposition is obtained by the following optimization function: (14) minZ,L,EZ+L+λE1s.t.D=DZ+LD+E(14) where D is the CIM, Z represents the low rank coefficient, L denotes the significance coefficient, λ is a positive parameter, and and 1are the nuclear norm and l1 norm respectively.

To facilitate the solution, (14) is equivalent to the following equation: (15) minZ,L,J,S,EJ+S+λE1s.t.D=DZ+LD+E,Z=J,L=S(15)

Based on an augmented Lagrange multiplier (ALM), (15) can be converted into an unconstrained augmented Lagrange function: (16) L(J,S,Z,L,E,η)=J+S+λE1+tr(η1T(DDZLDE))+tr(η2T(ZJ))+tr(η3T(LS))+μ2(DDZLDEF2)+μ2(ZJF2+LSF2)(16) where tr()and F are the trace and Frobenius norm of the matrix, η is the Lagrange multiplier, and μ is the penalty parameter. (16) achieves the update of J, S, Z, L, and E respectively by fixing other parameters.

With the above calculation, the low-rank coefficients Zfw, Zbw and significance coefficients Lfw, Lbw of CIMs CIMfw and CIMbw can be obtained. Mean weighted fusion is used for the low-rank part of the CIMs in order to preserve the low-rank information: (17) CIMZfusion=12(Zfw×CIMfw+Zbw×CIMbw)(17)

Furthermore, to preserve the significance features of the CIMs and highlight the changed regions, the significance part is fused using the square weighted fusion method: (18) CIMLfusion=12((Lfw×CIMfw)2+(Lbw×CIMbw)2)(18)

The fused CIM CIMfusion is calculated as follows: (19) CIMfusion=CIMZfusion+CIMZfusion(19)

2.5. CM generation

To further distinguish the changed and unchanged regions, the CM is obtained using the threshold segmentation method proposed in (Hou, Wang, and Liu Citation2017): (20) CM(s,t)={1,ifCIM(s,t)fusionζ×mean(CIMfusion)0,otherwise(20) where ζ represents the user adjustment threshold parameter and mean()donates the mean value operation. The GSGM framework is summarized in :

Table 1. GSGM framework.

3. Experiments and discussion

In this section, nine experimental datasets are introduced and used in an accuracy evaluation with several comparison methods, followed by the experimental results and parameter analysis of the proposed GSGM.

3.1. Datasets

The effectiveness of the proposed GSGM in MCD and its applicability in UCD are validated using five multimodal datasets (datasets #1–#5) and four unimodal datasets (datasets #6–#9). Images from different satellite types are included in the multimodal datasets, such as dataset #1 (optical and SAR images), dataset #2 (LiDAR and optical images), datasets #3 and #5 (images with the same sensor type but from different sensors), and dataset #4 (optical and normalized difference vegetation index (NDVI) images). The unimodal data consists of two optical images and two SAR images (). All images underwent preprocessing, such as optical imagery (radiometric calibration, geometric calibration, atmospheric correction, etc.) and SAR imagery (radiometric calibration, geometric calibration, filtering, interferometric processing, etc.). These preprocessing steps are intended to eliminate or reduce influence from sensors, platforms, or the environment in order to make the images more suitable for further analysis.

Figure 2. Datasets #1–#9; each dataset from left to right is the pre-event image, post-event image, and reference CM.

Figure 2. Datasets #1–#9; each dataset from left to right is the pre-event image, post-event image, and reference CM.

As shown in , the datasets used in this study had a large time window (the earliest image was acquired in December 1999 and the most recent in May 2018), a variety of sensor images with resolutions ranging from 2 m (dataset #6) to 30 m (dataset #5) in different regions of the world, and image sizes ranging from 300 × 412 to 3,500 × 2,000 covering multiple change scenarios (e.g. urban construction, flooding, river expansion, etc.). The robustness and efficiency of the proposed GSGM may be demonstrated by the nine datasets.

Table 2. Descriptions of datasets. In the ‘size’ column, the numbers within the parentheses indicate the number of bands for the imagery of the second time phase. For example, in dataset #1, 600 × 600 × 3(1) signifies that the imagery from the first time phase has 3 bands, while the imagery from the second time phase has 1 band.

3.2. Evaluation metrics

Empirical receiver operating characteristics (ROC) curves are used to evaluate the effectiveness of the CIMs produced by each comparison method. ROC curves represent the estimated pixel-wise probability of detection (PD) as a function of the probability of false alarm (PFA) by varying the binary segmentation threshold. The area under the curve (AUC) of the ROC curves is determined as the evaluation metric. For binary CM, Overall accuracy (OA), Kappa coefficient (KC), and F1-measure (F1) are used as the evaluation metrics: (21) OA=(TP+TN)/N(21) (22) KC=(OA-PRE)/(1-PRE)(22) where (23) PRE=(TP+FN)(TP+FP)+(TN+FP)(TN+FN)Np2(23) (24) F1=(2PR)/(P+R)(24) where (25) P=TP/(TP+FP)(25) (26) R=TP/(TP+FN)(26) where Np represents the total number of pixels in the image; TP means true positives, representing the number of correctly detected changed pixels; TN means true negatives, representing the number of correctly detected unchanged pixels; FP means false positives, representing the number of incorrectly detected changed pixels; and FN means false negatives, indicating the number of incorrectly detected unchanged pixels.

3.3. Comparison methods

In order to prove the effectiveness of the proposed GSGM, five novel methods are selected as comparison methods:

  1. LTFL (Zhan et al. Citation2018): Logarithmic transformation feature learning framework (LTFL) displays the stack denoising autoencoder to extract high-dimensional features from remote sensing images, and selects reliable samples to train a neural network classifier to distinguish between changed and unchanged regions according to the classifier.

  2. INLPG (Sun et al. Citation2022): Improved nonlocal patch-based graph (INLPG) builds a K-nearest neighbor (KNN) graph in the two temporal images respectively, and compares the KNN graph in the same image domain to extract the change information.

  3. GBF (Jimenez-Sierra et al. Citation2020): Graph-based fusion (GBF) uses the Laplacian matrix of the regularized graph to minimize the graph similarity of the two temporal images and highlight the changed regions.

  4. IRG-McS (Sun et al. Citation2021): Iterative robust graph and Markovian co-segmentation models (IRG-McS) construct a KNN graph to represent the structure of each image, and the CIM is obtained through the cross-mapping of the graph. On the basis of Markovian co-segmentation, the construction of the KNN graph and change information are iteratively optimized.

  5. SCASC (Sun et al. Citation2022): Sparse-constrained adaptive structure consistency (SCASC) is used to construct a regression model of the structure consistency image, and the prior sparse knowledge from the CIM is used to distinguish between changed and unchanged regions.

  6. SRGCAE (H. Chen et al. Citation2022): Structural relationship graph convolutional autoencoder (SRGCAE) measures the similarity of multimodal image structural relationships to extract the changes.

3.4. Experiments

3.4.1. Experiments on multimodal datasets

depicts the CIMs of various methods on multimodal datasets #1–5. The comparison methods in dataset #1 generally highlight the changed regions, but the intensity of changes varies significantly across the various changed regions. Among the comparative methods, except for SRGCAE, all other methods failed to detect changes in the right half of dataset #2. The CIMs of the other comparison methods did a better job of displaying the changed regions in dataset #3 than INLPG and GBF, although there are some gaps in the changed regions. The CI difference between the changed and unchanged regions of the CIMs of INLPG in dataset #4 is not significant. The other comparison methods can almost completely highlight the regions that have changed in dataset #4, but they have more ‘pseudo-change’ patches in the unchanged regions. IRG-MCS and SCASC show good CIMs in dataset #5, but some FPs remain. GSGM can effectively reflect the structural changes of multimodal data by constructing the GSG. The ROC curves of the CIMs for different methods on datasets #1–#5 are plotted in , and it can be seen that the proposed GSGM achieves optimal ROC curves. Therefore, the CIMs of GSGM on datasets #1–#5 can effectively highlight the changed regions while preserving their internal integrity, as well as having the fewest ‘pseudo-change’ patches in the unchanged regions.

Figure 3. CIMs of different methods on multimodal datasets: (a1–a5) LTFL; (b1–b5) INLPG; (c1–c5) GBF; (d1–d5) IRG-McS; (e1–e5) SCASC; (f1–f5) SRGCAE; (g1–g5) GSGM.

Figure 3. CIMs of different methods on multimodal datasets: (a1–a5) LTFL; (b1–b5) INLPG; (c1–c5) GBF; (d1–d5) IRG-McS; (e1–e5) SCASC; (f1–f5) SRGCAE; (g1–g5) GSGM.

Figure 4. ROC curves on (a) dataset #1, (b) dataset #2, (c) dataset #3, (d) dataset #4 and (e) dataset #5.

Figure 4. ROC curves on (a) dataset #1, (b) dataset #2, (c) dataset #3, (d) dataset #4 and (e) dataset #5.

The CMs of different methods on the multimodal datasets are shown in . In dataset #1, LTFL has more FPs due to the presence of shadow differences in the land parts of the images. INLPG, GBF, IRG-MCS, SCASC, and SRGCAE overcome the effects of image shadowing and have fewer FPs, but there are large

Figure 5. CMs of different methods on multimodal datasets: (a1–a5) LTFL; (b1–b5) INLPG; (c1–c5) GBF; (d1–d5) IRG-McS; (e1–e5) SCASC; (f1–f5) SRGCAE; (g1–g5) GSGM; (h1–h5) reference CM.

Figure 5. CMs of different methods on multimodal datasets: (a1–a5) LTFL; (b1–b5) INLPG; (c1–c5) GBF; (d1–d5) IRG-McS; (e1–e5) SCASC; (f1–f5) SRGCAE; (g1–g5) GSGM; (h1–h5) reference CM.

numbers of FNs. GSGM could obtain the most complete changed regions while suppressing image shadows. Dataset #2 mostly captures changes in an urban environment. None of the other comparison methods, excluding LTFL, were able to identify the changes in the right half of the region in dataset #2, and GSGM has fewer FPs than LTFL. Datasets #3 and #4 show changes as a result of flooding. Visually, LTFL, INLPG, and GBF all show significant FPs, whereas IRG-MCS, SCASC, and SRGCAE have comparatively fewer FPs, and GSGM comes closest to the reference CM overall. The challenges with dataset #5 are that the changed regions are rather minor, as well as the shadowing of the two images. In the land region, LTFL, INLPG, and GBF yield more FPs, while IRG-MCS, SCASC, and SRGCAE produce comparatively fewer FPs. The proposed GSGM has the fewest FPs and detects the most complete changed regions. The accuracy assessments of the comparison methods on multimodal datasets #1–#3 and #4–#5 are shown in and respectively, with the greatest accuracy denoted in bold. It is clear that GSGM obtains the highest AUC, OA, KC, and F1 values across all multimodal datasets, demonstrating its efficiency in identifying changes across multimodal images based on global structural features.

Table 3. Quantitative measures of binary CMs on multimodal datasets. The bolded font indicates the highest value.

Table 4. Quantitative measures of binary CMs on multimodal datasets. The bolded font indicates the highest value.

3.4.2. Experiments on unimodal datasets

Four unimodal datasets were used to confirm the effectiveness of the proposed method on unimodal datasets. The CIMs of different methods on the unimodal datasets are shown in . In dataset #6, LTFL fails to highlight the changed regions, and GBF highlights the unchanged regions, although it also highlights the changed regions. The other comparison methods have lower CI in the unchanged regions, but the highlighted changed regions are not internally connected. In dataset #7, GBF does not clearly highlight the changed regions, while the other comparison methods only partially highlight the changed regions, and only GSGM completely highlights the changed region in the upper right corner of the image. Although the other methods could basically highlight the changed regions in datasets #8 and #9, there are some voids inside the changed regions, and GBF fails to highlight the changed regions in these datasets. As illustrated in (f1–f4), the changed regions can be desirably highlighted by GSGM with the best internal connectivity. shows the ROC curves of the CIMs of different methods on the unimodal datasets, where it can be seen that the proposed GSGM achieves approximate results with the comparison methods in dataset #7 and optimal ROC curves in datasets #8 and #9.

Figure 6. CIMs of different methods on unimodal datasets: (a1–a4) LTFL; (b1–b4) INLPG; (c1–c4) GBF; (d1–d4) IRG-McS; (e1–e4) SCASC; (f1–f4) SRGCAE; (g1–g4) GSGM.

Figure 6. CIMs of different methods on unimodal datasets: (a1–a4) LTFL; (b1–b4) INLPG; (c1–c4) GBF; (d1–d4) IRG-McS; (e1–e4) SCASC; (f1–f4) SRGCAE; (g1–g4) GSGM.

Figure 7. ROC curves on (a) dataset #6, (b) dataset #7, (c) dataset #8 and (d) dataset #9.

Figure 7. ROC curves on (a) dataset #6, (b) dataset #7, (c) dataset #8 and (d) dataset #9.

illustrates the CMs of different methods on the unimodal datasets. Datasets #6 and #7 reflect the changes caused by urban construction with complex feature types, such as buildings, vegetation, roads, etc. Visually, in dataset #6, LTFL and INLPG have more FPs, while GBF, IRG-MCS, SCASC, and SRGCAE have fewer FPs, but generate a large number of FNs. In dataset #7, GBF, LTFL, and INLPG have considerable FPs, and IRG-MCS and SCASC have more FNs. In comparison to these methods, the proposed GSGM has the most comprehensive detection results. The difficulty of datasets #8 and #9 lies in the fact that the changed regions are relatively minor and there is intense noise in the images. LTFL and GBF do not effectively suppress the image noise, and generate more FPs. INLPG, IRG-MCS, SCASC, and SRGCAE overcome the effects of image noise to a certain extent, but fail to detect some minor changes. Visually, the proposed GSGM is unaffected by image noise and can detect minor changes. Owing to the change metric of the global structural features of the image, the CIMs of the proposed GSGM can more accurately distinguish the changed and unchanged regions, while the CD result of GSGM is closest to the reference CM. From the accuracy evaluation of CMs on the unimodal datasets in and , it can be seen that the KC and F1 of GSGM are the highest among these five comparison methods, indicating that GSGM is applicable not only to MCD, but also to UCD.

Figure 8. CMs of different methods on unimodal datasets: (a1–a4) LTFL; (b1–b4) INLPG; (c1–c4) GBF; (d1–d4) IRG-McS; (e1–e4) SCASC; (f1–f4) SRGCAE; (g1–g4) GSGM; (h1–h4) reference CM.

Figure 8. CMs of different methods on unimodal datasets: (a1–a4) LTFL; (b1–b4) INLPG; (c1–c4) GBF; (d1–d4) IRG-McS; (e1–e4) SCASC; (f1–f4) SRGCAE; (g1–g4) GSGM; (h1–h4) reference CM.

Table 5. Quantitative measures of binary CMs on unimodal datasets. The bolded font indicates the highest value.

Table 6. Quantitative measures of binary CMs on the unimodal datasets. The bolded font indicates the highest value.

To further validate the efficacy of the proposed GSGM on unimodal datasets #6 and #7, we utilized two established UCD methods: Change Vector Analysis (CVA) (Bovolo and Bruzzone Citation2007) and Deep Change Vector Analysis (DCVA) (Saha, Bovolo, and Bruzzone Citation2019). As depicted in , the results from both datasets show that CMs of CVA are prone to significant salt-and-pepper noise, a consequence of its pixel-centric approach to CD. In contrast, DCVA, which incorporates deep features from the imagery, is more effective at reducing such noise. The precision evaluation detailed in reveals that DCVA not only surpasses CVA in accuracy but also equals the performance of GSGM on dataset #7. However, on dataset #6, DCVA's performance is slightly inferior to GSGM, underscoring the robustness of the proposed GSGM across different unimodal datasets.

Figure 9. The first and second rows correspond to the CMs of dataset #6 and dataset #7, respectively. (a1-a2) CVA; (b1-b2) DCVA; (c1-c2) GSGM; (d1–d2) reference CM.

Figure 9. The first and second rows correspond to the CMs of dataset #6 and dataset #7, respectively. (a1-a2) CVA; (b1-b2) DCVA; (c1-c2) GSGM; (d1–d2) reference CM.

Table 7. Quantitative measures of binary CMs on unimodal dataset #6 and dataset #7. The bolded font indicates the highest value.

3.5. Discussion

3.5.1. Experiment setting and parameter analysis

Based on the ROC curves, threshold parameter ζis set between 1.3 and 2.0 for all datasets to obtain satisfactory CMs. Step coefficient αΔv, patch size ws and target patch's step size Δp are the main parameters of GSGM. Step coefficient αΔv controls the sparsity of the vertex patches; the greater the value of αΔv, the more numerous the vertex patches. Considering the accuracy and efficiency of the method, this paper fixes αΔv to 0.1, and discusses the influence of patch size ws and target patch's step size Δp on the accuracy of GSG.

1) Patch size ws. Weight coefficient λ is fixed to 2, and patch size ws is set from 3 to 13 with an interval of 2. shows the influence of patch size ws on GSGM. It can be seen that datasets #1, #3, #5, #7, #8, and #9 reach the maximum AUC and KC when ws is set to 5, and datasets #2, #4, and #6 reach the maximum AUC and KC when ws is set to 9, 11, and 7 respectively. This is because datasets #1, #3, #5, #7, #8, and #9 contain relatively small changed regions, so the patch size needs to be set smaller to ensure the detection of small changed regions, while the changed regions of datasets #2, #4, and #6 are relatively large, so the patch size needs to be set larger to ensure the complete detection of changed regions. Therefore, this paper suggests that when the changed regions are relatively small, patch size ws should be set to 5, and when the changed regions are relatively large, patch size ws should be set to a larger value (e.g. 9 or 11).

Figure 10. Influence of patch size ws on GSGM performance: (a) AUC-ws curves; (b) KC-ws curves.

Figure 10. Influence of patch size ws on GSGM performance: (a) AUC-ws curves; (b) KC-ws curves.

2) Target patch's step size Δp. To evaluate the influence of the target patch's step size Δp on the performance of the proposed GSGM, we maintained the patch size ws at a constant value of 5 and systematically adjusted Δp from 1 to 5. shows that the GSGM's accuracy is maintained at a high level for Δp values of 1 and 2. However, there is a discernible decrease in accuracy as Δp continues to increase. This decline is attributed to the sparser distribution of image patches and the consequent reduction in the robustness of the extracted structural features with larger Δp values. On the other hand, a Δp value that is too small leads to an overabundance of image patches, which in turn amplifies the computational burden of the algorithm. Given these considerations, this study recommends setting Δp equal to (ws1)/2, which simplifies the process and optimizes computational efficiency without compromising accuracy.

Figure 11. Influence of target patch's step size Δp on GSGM performance: (a) AUC-Δp curves; (b) KC-Δp curves.

Figure 11. Influence of target patch's step size Δp on GSGM performance: (a) AUC-Δp curves; (b) KC-Δp curves.

3) Computation time. To illustrate the computational efficiency of the proposed GSGM, we fix the patch size ws at 5, and use datasets #3 and #5, which represent the largest and smallest in size, respectively (refer to ). Our implementation of GSGM utilizes MATLAB 2020a on a Windows desktop with an AMD Ryzen 7 3800X 8-Core Processor (3.89 GHz) and 64 GB of RAM. As demonstrated in , GSGM's computational time is inversely proportional to the increase in the step size parameter Δp. further confirms that GSGM maintains significant accuracy when Δp ranges from 1 to 4. To optimize for reduced computation time without compromising accuracy, it is recommended to set a larger Δp value. Moreover, computational efficiency can be enhanced by refining the graph construction process, such as improving the efficiency of similarity measurements between image patches, or by leveraging parallel computing to distribute tasks across multiple cores or nodes.

Table 8. Computational time (seconds) of the proposed GSGM on datasets #3 and #5 under different values of Δp.

3.5.2. Ablation study

The proposed GSGM consists of three main tasks: GSG construction, CI measure, and CIMs fusion. The effectiveness of the core components of these three tasks (GSG, weight change metrics (WCM), and LLRR) will be discussed separately.

1) The effectiveness of GSG. To validate the efficacy of the GSG, the top 10% of vertex patches most similar to each target patch are utilized to construct the graph, following INLPG (Sun et al. Citation2022). The results in demonstrate that GSG attained the highest AUC and KC values across all datasets, with AUC increasing by 3.58% and KC by 11.79% on average respectively. This proves that GSG could better express the structural features of the images.

Table 9. Quantitative measures of different graphs on datasets #1–#9.

2) The effectiveness of WCM. Weight coefficient λ affects the contribution of each vertex patch to the CI of the target patch, and weight coefficient λ is set to 0, 0.5, 1, 1.5, 2, 2.5, and 3 respectively. For datasets #1, #3, #5, #7, #8, and #9, patch size ws is set to 5, and for datasets #2, #4, and #6, patch size ws is set to 9, 11, and 7 respectively. shows the influence of weight coefficient λ on GSGM. As can be seen from , most datasets reach the best AUC and KC values when weight coefficient λ is 2.0; only some datasets (#1 and #2) achieve the maximum AUC and KC values when λ is 1.5; and when weight coefficient λ is set to 0, all datasets obtain lower accuracy. This indicates the effectiveness of the WCM. For simplicity, this paper suggests setting weight coefficient λ to 2.0.

Figure 12. Influence of weight coefficient λ on GSGM performance; (a) AUC-λ curves; (b) KC-λ curves.

Figure 12. Influence of weight coefficient λ on GSGM performance; (a) AUC-λ curves; (b) KC-λ curves.

3) The effectiveness of the CI measure. In this paper, we explored two distinct methods for measuring changes: the Intrinsic change of the GSG (referred to as dif(i,j)1) and the difference between the mapped GSG and the GSG (referred to as dif(i,j)2). To assess the individual impacts of dif(i,j)1 and dif(i,j)2, we conducted two separate experiments, each focusing on one method for analyzing changes in multimodal imagery. The outcomes of these ablation experiments are detailed in . reveals that utilizing both dif(i,j)1 and dif(i,j)2 together, rather than relying solely on either dif(i,j)1 or dif(i,j)2, led to a notable enhancement in performance metrics. Specifically, there is an average increase of 2.38% and 2.14% of AUC, and 6.67% and 5.3% of KC, respectively. These findings suggest that the combined use of GSG's Intrinsic change and the difference between mapped GSG and GSG significantly improves the representation of change information and enhances the precision of CD.

Table 10. Effectiveness of CI measurement.

4) The effectiveness of LLRR. For CIMs fusion, mean weighted fusion (MWF) and wavelet fusion (Pu Citation2000) are utilized as controls. As evident in , the proposed GSGM continued to achieve optimal results, elevating the AUC values by 2.80% and 1.63% while improving KC values by 4.56% and 4.37% over MWF and WF respectively. These findings validate that the proposed method can effectively enhance the representation of change information and improve detection accuracy through fusing the low-rank and significance components of the CIMs.

Table 11. Quantitative measures of different DIs fusion methods on datasets #1–#9.

4. Conclusion

In order to address the issue of ‘incomparability’ caused by the large imaging differences of multimodal images, this paper proposes a global structure graph mapping (GSGM)-based MCD method. GSGM constructs the GSGs of multimodal images to represent the structural information, then compares them by cross-mapping the GSGs into the same image domain. A new weighted change metric formula is built to obtain robust change information, the forward and backward CIMs are fused using low-rank representation to obtain a robust CIM, and the CM are obtained through threshold segmentation. Experiments on five multimodal datasets and four unimodal datasets demonstrate the superiority and robustness of the proposed method.

The proposed GSGM has certain limitations. Specifically, there is room for the enrichment and diversification of the data types currently used. Furthermore, the method’s detection accuracy requires further improvement, especially for complex change scenes (such as datasets #3 and #6). In future research, we will focus on developing the diversity of evaluated datasets to enhance the CD capabilities across various image types. For example, we will continue to consider the polarization mode and image quality of SAR imagery (including scalloping and speckle effects), and their impact on image structural features. Building upon this, we will take into account the physical characteristics of SAR imagery to enhance its applicability to MCD tasks within complex scenarios. In order to further reduce the impact of imaging conditions (such as atmospheric effects, thermal noise, and illumination) on the quality of remote sensing images, and to enhance the accuracy of constructing GSGs, we will consider further improvements in the quality of remote sensing images in future research (such as spectral calibration for hyperspectral imagery (Hong et al. Citation2019)). Meanwhile, refining graph construction to improve the representation of structural image features also holds promise for strengthening the method’s versatility.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Funding

This study was supported by the National Natural Science Foundation of China (Grant 41971313); National Natural Science Foundation of China (Grant 42271411); Scientific Research Innovation Project for Graduate Students in Hunan Province (No. CX20220169); and Research Project on Monitoring and Early Warning Technologies for the Implementation of Land Use Planning in Guangzhou City (2020B0101130009).

Data availability statement

Datasets #1–#2 and #8–#9 that support this study are available at [https://github.com/rshante0426/MCD-datasets]; datasets #3–#5 that support this study are available at [https://www-labs.iro.umontreal.ca/~mignotte/]; and datasets #6–#7 that support this study are available at [https://github.com/MinZHANG-WHU/FDCNN?tab=readme-ov-file].

References

  • Alberga, Vito. 2009. “Similarity Measures of Remotely Sensed Multi-Sensor Images for Change Detection Applications.” Remote Sensing 1 (3): 122–143. https://doi.org/10.3390/rs1030122.
  • Bovolo, Francesca, and Lorenzo Bruzzone. 2007. “A Theoretical Framework for Unsupervised Change Detection Based on Change Vector Analysis in the Polar Domain.” IEEE Transactions on Geoscience and Remote Sensing 45 (1): 218–236. https://doi.org/10.1109/TGRS.2006.885408.
  • Brunner, Dominik, Guido Lemoine, and Lorenzo Bruzzone. 2010. “Earthquake Damage Assessment of Buildings Using VHR Optical and SAR Imagery.” IEEE Transactions on Geoscience and Remote Sensing 48 (5): 2403–2420. https://doi.org/10.1109/TGRS.2009.2038274.
  • Chen, Yuzeng, Yuqi Tang, Te Han, Yuwei Zhang, Bin Zou, and Huihui Feng. 2022. “RAMC: A Rotation Adaptive Tracker with Motion Constraint for Satellite Video Single-Object Tracking.” Remote Sensing 14 (13): 3108. https://doi.org/10.3390/rs14133108.
  • Chen, Yuzeng, Yuqi Tang, Yi Xiao, Qiangqiang Yuan, Yuwei Zhang, Fengqing Liu, Jiang He, and Liangpei Zhang. 2024. “Satellite Video Single Object Tracking: A Systematic Review and an Oriented Object Tracking Benchmark.” ISPRS Journal of Photogrammetry and Remote Sensing 210 (April): 212–240. https://doi.org/10.1016/j.isprsjprs.2024.03.013.
  • Chen, Yuzeng, Yuqi Tang, Zhiyong Yin, Te Han, Bin Zou, and Huihui Feng. 2022. “Single Object Tracking in Satellite Videos: A Correlation Filter-Based Dual-Flow Tracker.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15: 6687–6698. https://doi.org/10.1109/JSTARS.2022.3185328.
  • Chen, Hongruixuan, Naoto Yokoya, Chen Wu, and Bo Du. 2022. “Unsupervised Multimodal Change Detection Based on Structural Relationship Graph Representation Learning.” IEEE Transactions on Geoscience and Remote Sensing 60: 1–18. https://doi.org/10.1109/TGRS.2022.3229027.
  • Fan, Qiufeng, Fanbo Hou, and Feng Shi. 2021. “A Fusion Method for Infrared and Visible Images Based on Iterative Guided Filtering and Two Channel Adaptive Pulse Coupled Neural Network.” International Journal of Image and Data Fusion 12 (1): 23–47. https://doi.org/10.1080/19479832.2020.1814877.
  • Gong, Maoguo, Xudong Niu, Tao Zhan, and Mingyang Zhang. 2019. “A Coupling Translation Network for Change Detection in Heterogeneous Images.” International Journal of Remote Sensing 40 (9): 3647–3672. https://doi.org/10.1080/01431161.2018.1547934.
  • Gong, Maoguo, Xudong Niu, Puzhao Zhang, and Zhetao Li. 2017. “Generative Adversarial Networks for Change Detection in Multispectral Imagery.” IEEE Geoscience and Remote Sensing Letters 14 (12): 2310–2314. https://doi.org/10.1109/LGRS.2017.2762694.
  • Han, Te, Yuqi Tang, and Yuzeng Chen. 2022. “Heterogeneous Image Change Detection Based on Two-Stage Joint Feature Learning.” In IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium, 3215–3218. https://doi.org/10.1109/IGARSS46834.2022.9883323.
  • Han, Te, Yuqi Tang, Xin Yang, Zefeng Lin, Bin Zou, and Huihui Feng. 2021. “Change Detection for Heterogeneous Remote Sensing Images with Improved Training of Hierarchical Extreme Learning Machine (HELM).” Remote Sensing 13 (23): 4918. https://doi.org/10.3390/rs13234918.
  • Han, Te, Yuqi Tang, Bin Zou, and Huihui Feng. 2024. “Unsupervised Multimodal Change Detection Based on Adaptive Optimization of Structured Graph.” International Journal of Applied Earth Observation and Geoinformation 126 (February): 103630. https://doi.org/10.1016/j.jag.2023.103630.
  • Hong, Danfeng, Jing Yao, Chenyu Li, Deyu Meng, Naoto Yokoya, and Jocelyn Chanussot. 2023. “Decoupled-and-Coupled Networks: Self-Supervised Hyperspectral Image Super-Resolution With Subpixel Fusion.” IEEE Transactions on Geoscience and Remote Sensing 61: 1–12. https://doi.org/10.1109/TGRS.2023.3324497.
  • Hong, Danfeng, Naoto Yokoya, Jocelyn Chanussot, and Xiao Xiang Zhu. 2019. “An Augmented Linear Mixing Model to Address Spectral Variability for Hyperspectral Unmixing.” IEEE Transactions on Image Processing 28 (4): 1923–1938. https://doi.org/10.1109/TIP.2018.2878958.
  • Hong, Danfeng, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, et al. 2024. “SpectralGPT: Spectral Remote Sensing Foundation Model”. https://doi.org/10.1109/TPAMI.2024.3362475.
  • Hong, Danfeng, Bing Zhang, Hao Li, Yuxuan Li, Jing Yao, Chenyu Li, Martin Werner, Jocelyn Chanussot, Alexander Zipf, and Xiao Xiang Zhu. 2023. “Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for Cross-City Semantic Segmentation Using High-Resolution Domain Adaptation Networks.” Remote Sensing of Environment 299 (December): 113856. https://doi.org/10.1016/j.rse.2023.113856.
  • Hou, Bin, Yunhong Wang, and Qingjie Liu. 2017. “Change Detection Based on Deep Features and Low Rank.” IEEE Geoscience and Remote Sensing Letters 14 (12): 2418–2422. https://doi.org/10.1109/LGRS.2017.2766840.
  • Jimenez-Sierra, David Alejandro, Hernán Darío Benítez-Restrepo, Hernán Darío Vargas-Cardona, and Jocelyn Chanussot. 2020. “Graph-Based Data Fusion Applied To: Change Detection and Biomass Estimation in Rice Crops.” Remote Sensing 12 (17): 2683. https://doi.org/10.3390/rs12172683.
  • Liu, Jia, Maoguo Gong, Kai Qin, and Puzhao Zhang. 2018. “A Deep Convolutional Coupling Network for Change Detection Based on Heterogeneous Optical and Radar Images.” IEEE Transactions on Neural Networks and Learning Systems 29 (3): 545–559. https://doi.org/10.1109/TNNLS.2016.2636227.
  • Liu, Zhunga, Gang Li, Gregoire Mercier, You He, and Quan Pan. 2018. “Change Detection in Heterogenous Remote Sensing Images via Homogeneous Pixel Transformation.” IEEE Transactions on Image Processing 27 (4): 1822–1834. https://doi.org/10.1109/TIP.2017.2784560.
  • Liu, Guangcan, and Shuicheng Yan. 2011. “Latent Low-Rank Representation for Subspace Segmentation and Feature Extraction.” In 2011 International Conference on Computer Vision, 1615–1622. Barcelona, Spain: IEEE. https://doi.org/10.1109/ICCV.2011.6126422.
  • Luppino, Luigi Tommaso, Filippo Maria Bianchi, Gabriele Moser, and Stian Normann Anfinsen. 2019. “Unsupervised Image Regression for Heterogeneous Change Detection.” IEEE Transactions on Geoscience and Remote Sensing 57 (12): 9960–9975. https://doi.org/10.1109/TGRS.2019.2930348.
  • Luppino, Luigi Tommaso, Michael Kampffmeyer, Filippo Maria Bianchi, Gabriele Moser, Sebastiano Bruno Serpico, Robert Jenssen, and Stian Normann Anfinsen. 2022. “Deep Image Translation With an Affinity-Based Change Prior for Unsupervised Multimodal Change Detection.” IEEE Transactions on Geoscience and Remote Sensing 60: 1–22. https://doi.org/10.1109/TGRS.2021.3056196.
  • Mercier, G., G. Moser, and S. B. Serpico. 2008. “Conditional Copulas for Change Detection in Heterogeneous Remote Sensing Images.” IEEE Transactions on Geoscience and Remote Sensing 46 (5): 1428–1441. https://doi.org/10.1109/TGRS.2008.916476.
  • Mignotte, Max. 2020. “A Fractal Projection and Markovian Segmentation-Based Approach for Multimodal Change Detection.” IEEE Transactions on Geoscience and Remote Sensing 58 (11): 8046–8058. https://doi.org/10.1109/TGRS.2020.2986239.
  • Prendes, Jorge, Marie Chabert, Frederic Pascal, Alain Giros, and Jean-Yves Tourneret. 2015. “A New Multivariate Statistical Model for Change Detection in Images Acquired by Homogeneous and Heterogeneous Sensors.” IEEE Transactions on Image Processing 24 (3): 799–812. https://doi.org/10.1109/TIP.2014.2387013.
  • Pu, Tian. 2000. “Contrast-Based Image Fusion Using the Discrete Wavelet Transform.” Optical Engineering 39 (8): 2075. https://doi.org/10.1117/1.1303728.
  • Rajakumar, C., and S. Satheeskumaran. 2022. “Singular Value Decomposition and Saliency - Map Based Image Fusion for Visible and Infrared Images.” International Journal of Image and Data Fusion 13 (1): 21–43. https://doi.org/10.1080/19479832.2020.1864786.
  • Reba, Meredith, and Karen C. Seto. 2020. “A Systematic Review and Assessment of Algorithms to Detect, Characterize, and Monitor Urban Land Change.” Remote Sensing of Environment 242 (June): 111739. https://doi.org/10.1016/j.rse.2020.111739.
  • Saha, Sudipan, Francesca Bovolo, and Lorenzo Bruzzone. 2019. “Unsupervised Deep Change Vector Analysis for Multiple-Change Detection in VHR Images.” IEEE Transactions on Geoscience and Remote Sensing 57 (6): 3677–3693. https://doi.org/10.1109/TGRS.2018.2886643.
  • Sun, Yuli, Lin Lei, Dongdong Guan, and Gangyao Kuang. 2021. “Iterative Robust Graph for Unsupervised Change Detection of Heterogeneous Remote Sensing Images.” IEEE Transactions on Image Processing 30: 6277–6291. https://doi.org/10.1109/TIP.2021.3093766.
  • Sun, Yuli, Lin Lei, Dongdong Guan, Ming Li, and Gangyao Kuang. 2022. “Sparse-Constrained Adaptive Structure Consistency-Based Unsupervised Image Regression for Heterogeneous Remote-Sensing Change Detection.” IEEE Transactions on Geoscience and Remote Sensing 60: 1–14. https://doi.org/10.1109/TGRS.2021.3110998.
  • Sun, Yuli, Lin Lei, Xiao Li, Hao Sun, and Gangyao Kuang. 2021. “Nonlocal Patch Similarity Based Heterogeneous Remote Sensing Change Detection.” Pattern Recognition 109 (January): 107598. https://doi.org/10.1016/j.patcog.2020.107598.
  • Sun, Yuli, Lin Lei, Xiao Li, Xiang Tan, and Gangyao Kuang. 2022. “Structure Consistency-Based Graph for Unsupervised Change Detection With Homogeneous and Heterogeneous Remote Sensing Images.” IEEE Transactions on Geoscience and Remote Sensing 60: 1–21. https://doi.org/10.1109/TGRS.2021.3053571.
  • Tang, Huakang, Honglei Wang, and Xiaoping Zhang. 2022. “Multi-Class Change Detection of Remote Sensing Images Based on Class Rebalancing.” International Journal of Digital Earth 15 (1): 1377–1394. https://doi.org/10.1080/17538947.2022.2108921.
  • Tang, Yuqi, Xin Yang, Te Han, Fangyan Zhang, Bin Zou, and Huihui Feng. 2024. “Enhanced Graph Structure Representation for Unsupervised Heterogeneous Change Detection.” Remote Sensing 16 (4): 721. https://doi.org/10.3390/rs16040721.
  • Tang, Yuqi, and Liangpei Zhang. 2017. “Urban Change Analysis with Multi-Sensor Multispectral Imagery.” Remote Sensing 9 (3): 252. https://doi.org/10.3390/rs9030252.
  • Touati, Redha, and Max Mignotte. 2018. “An Energy-Based Model Encoding Nonlocal Pairwise Pixel Interactions for Multisensor Change Detection.” IEEE Transactions on Geoscience and Remote Sensing 56 (2): 1046–1058. https://doi.org/10.1109/TGRS.2017.2758359.
  • Touati, Redha, Max Mignotte, and Mohamed Dahmane. 2019. “Multimodal Change Detection Using a Convolution Model-Based Mapping.” In 2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA), 1–6. Istanbul, Turkey: IEEE. https://doi.org/10.1109/IPTA.2019.8936127.
  • Touati, Redha, Max Mignotte, and Mohamed Dahmane. 2020. “Multimodal Change Detection in Remote Sensing Images Using an Unsupervised Pixel Pairwise-Based Markov Random Field Model.” IEEE Transactions on Image Processing 29: 757–767. https://doi.org/10.1109/TIP.2019.2933747.
  • Wan, Ling, Yuming Xiang, and Hongjian You. 2019. “A Post-Classification Comparison Method for SAR and Optical Images Change Detection.” IEEE Geoscience and Remote Sensing Letters 16 (7): 1026–1030. https://doi.org/10.1109/LGRS.2019.2892432.
  • Wan, L., T. Zhang, and H. J. You. 2018. “Multi-Sensor Remote Sensing Image Change Detection Based on Sorted Histograms.” International Journal of Remote Sensing 39 (11): 3753–3775. https://doi.org/10.1080/01431161.2018.1448481.
  • Wang, Z., A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. “Image Quality Assessment: From Error Visibility to Structural Similarity.” IEEE Transactions on Image Processing 13 (4): 600–612. https://doi.org/10.1109/TIP.2003.819861.
  • Wang, Decheng, Feng Zhao, Hui Yi, Yinan Li, and Xiangning Chen. 2022. “An Unsupervised Heterogeneous Change Detection Method Based on Image Translation Network and Post-Processing Algorithm.” International Journal of Digital Earth 15 (1): 1056–1080. https://doi.org/10.1080/17538947.2022.2092658.
  • Wei, Xuexin, Yang Liu, Lin Qi, Jilong Chen, Guoqin Wang, Linxiu Zhang, and Ronggao Liu. 2023. “Monitoring Forest Dynamics in Africa during 2000–2020 Using a Remotely Sensed Fractional Tree Cover Dataset.” International Journal of Digital Earth 16 (1): 2212–2232. https://doi.org/10.1080/17538947.2023.2220613.
  • Wu, Yue, Jiaheng Li, Yongzhe Yuan, A. K. Qin, Qi-Guang Miao, and Mao-Guo Gong. 2022. “Commonality Autoencoder: Learning Common Features for Change Detection from Heterogeneous Images.” IEEE Transactions on Neural Networks and Learning Systems 33 (9): 4257–4270. https://doi.org/10.1109/TNNLS.2021.3056238.
  • Zhan, Tao, Maoguo Gong, Xiangming Jiang, and Shuwei Li. 2018. “Log-Based Transformation Feature Learning for Change Detection in Heterogeneous Images.” IEEE Geoscience and Remote Sensing Letters 15 (9): 1352–1356. https://doi.org/10.1109/LGRS.2018.2843385.
  • Zhan, Tao, Maoguo Gong, Jia Liu, and Puzhao Zhang. 2018. “Iterative Feature Mapping Network for Detecting Multiple Changes in Multi-Source Remote Sensing Images.” ISPRS Journal of Photogrammetry and Remote Sensing 146 (December): 38–51. https://doi.org/10.1016/j.isprsjprs.2018.09.002.
  • Zhao, Wei, Zhirui Wang, Maoguo Gong, and Jia Liu. 2017. “Discriminative Feature Learning for Unsupervised Change Detection in Heterogeneous Images Based on a Coupled Neural Network.” IEEE Transactions on Geoscience and Remote Sensing 55 (12): 7066–7080. https://doi.org/10.1109/TGRS.2017.2739800.