Full article: Hybrid approach using deep learning and graph comparison for building change detection

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Existing methods of detecting building changes from very-high-resolution (VHR) images are limited by positional displacement. Although various change detection (CD) methods including deep learning methods have been proposed, they are incapable of overcoming the aforementioned limitation. Therefore, this study proposes a two-step hybrid approach using deep learning and graph comparison to detect building changes in VHR temporal images. First, the building objects were detected using mask regional-convolutional neural networks (Mask R-CNN), wherein the centroid of the bounding box was extracted as the building node. Second, for each image, graphs were generated using the extracted building nodes. Accordingly, the changed nodes were identified based on iterative graph comparison, which could be voluntarily halted without setting thresholds by examining the changes in the proposed index while sequentially eliminating the building changes. To demonstrate the effectiveness of the proposed method, we experimentally tested the simulated images with synthetic changes and positional displacements. The results verified that the proposed method effectively reduced the false detections originating from positional inconsistencies. Consequently, the proposed method could overcome the limitations of conventional CD methods by employing a graph model based on the connectivity between adjacent buildings.

KEYWORDS:

1. Introduction

Building change detection (CD) using very-high-resolution (VHR) temporal images is crucial in land-use planning, map database updating, and disaster assessment (Tian, Cui, and Reinartz Citation2014). However, it presents critical issues related to geometric registration errors and inaccurate radiation correction (Huang, Zhang, and Zhu Citation2014; Ji et al. Citation2019). In particular, positional inconsistencies between VHR temporal images are primarily caused by the variations in the camera position, viewing angle, or atmospheric conditions, which can be overcome using additional high-precision digital surface models or true orthorectification processes (Chen et al. Citation2021).

In general, CD methods can be classified into pixel- and object-based approaches. Pixel-based techniques such as change vector analysis (CVA) can extract the changed pixels for quantitative analysis. However, they cannot be readily applied to VHR images with positional inconsistencies because they use individual pixels as basic units, while neglecting the spatial context of an image (Han et al. Citation2020). As such, pixel-based methods for building CD focus on the spectral analysis of bitemporal images and were originally developed for medium-resolution satellite images. Furthermore, these methods require additional data such as building height information to extract the building changes (Tian, Cui, and Reinartz Citation2014).

In contrast, object-based approaches are more suitable for VHR images because they generate objects using adjacent pixels with similar characteristics such as spectral value, texture, and spatial relationship (Han et al. Citation2020). However, object-based image analysis (OBIA) depends on segmentation accuracy and does not highlight the “from – to” changes (Hussain et al. Citation2013). Although various objects may be extracted from multitemporal VHR images based on shadows and distorted shapes, the changes cannot be easily identified in case of positional inconsistencies because matching the corresponding objects such as building pairs becomes difficult during comparisons. To surpass these limitations, Xiao et al. (Citation2017) developed a novel object-based method for building CD, which segmented the bitemporal images using graph-based energy minimization. Although this method achieved improved results, it was susceptible toward accepting a registration error within one pixel and generating false alarms.

Recently, deep learning-based approaches that learn changed information from a dataset containing bitemporal images have garnered attention for CD tasks. To date, various deep learning networks have been applied to VHR temporal images (Wang et al. Citation2018; Chen et al. Citation2020; Esfandiari et al. Citation2021; Jiang et al. Citation2022; Liu et al. Citation2022; Ye et al. Citation2022; Xia et al. Citation2022; Zheng et al. Citation2022). For instance, Xia et al. (Citation2022) proposed an end-to-end CD network comprising a fusion encoder, which combined a convolutional neural network with transformer features, and a decoder to reduce background noises. Furthermore, they used object edges to construct the mask features of buildings. Liu et al. (Citation2022) developed a supervised domain-adaptation framework that effectively alleviated the domain shift between bitemporal images by extracting domain-invariant building features to align various feature distributions across the feature space. Zheng et al. (Citation2022) proposed a high-frequency attention Siamese network for building CD from VHR images to obtain high-frequency patterns using the deep learning pipeline and more accurately detect the edges of changed buildings. To extract changes in buildings with dense layouts, Ye et al. (Citation2022) used a feature decomposition – optimization–reorganization network that modeled the main body and edge features of buildings based on similarity. Although state-of-the-art approaches have exhibited suitable performance on open CD datasets, they have not been validated on datasets with positional inconsistency.

In principle, the performance of deep learning networks relies on the quality of datasets. However, CD datasets are limited (Sun et al. Citation2022) and most CD datasets, such as LEVIR-CD (Chen and Shi Citation2020) and the WHU building dataset (Ji, Wei, and Lu Citation2019), contain orthoimages composed of bitemporal VHR images without relief displacement. Furthermore, the spatial and spectral resolution differences between a training and real dataset can produce pseudo-changes. Huang, Tang, and Qin (Citation2022) comprehensively examined the potential of PlanetScope images for 3D reconstruction and CD. Although PlanetScope images portray buildings as one of the target objects, this dataset is inappropriate for this study because the relief differences were already minimized at the time of image acquisition. Esfandiari et al. (Citation2021) addressed the issue of positional inconsistencies between buildings in temporal images by combining the preprocessing for geometric correction with a deep learning-based building CD method focusing on off-nadir images and incorporating patch-wise co-registration to overcome relief displacement. However, preprocessing geometric corrections requires additional data such as GIS building footprints or digital surface models.

As an alternative, graph-based frameworks have been proposed as efficient CD methods for VHR images (Pham, Mercier, and Michel Citation2016; Kalinicheva et al. Citation2020; Wu et al. Citation2021; Sun et al. Citation2022; N. Wang et al. Citation2022). Wu et al. (Citation2021) employed an object-based graph model to develop an unsupervised CD method for VHR images. In particular, they segmented the temporal images, constructed weighted graphs for each segmented object, and extracted the changed objects by comparing the similarity between the graphs in bitemporal optical and SAR datasets. In contrast, Kalinicheva et al. (Citation2020) adopted a graph-based approach utilizing an autoencoder with segmentation techniques for CD. In their framework, graphs were created for temporal behavior modeling and configured with the changed objects in identical areas.

Instead of focusing on the location or shape, graph models exploit the spatial context between objects and estimate the topological similarities to detect changes based on their relationships with the surrounding objects. Therefore, as an alternative, the connectivity information between the objects stored in graphs can be utilized to compensate for positional inconsistency in CD problems. However, existing graph-based CD frameworks targeting remote-sensing images with low resolutions and are unsuitable for detecting changes in specific objects (i.e. changed buildings). Furthermore, previous graph models assuming topology consistency (Pham, Mercier, and Michel Citation2016; Wu et al. Citation2021) cannot be directly applied to images with positional inconsistency for detecting changes in buildings. Specifically, topological consistency was ensured by applying a graph configured from one image to another or by extracting graph vertices using identical patches for two distinct images. As previous approaches constructed a graph by segmenting a continuous area according to equivalent division units, these models used predetermined node pairs for comparison, i.e. previous graph models assumed the positional consistency of identical patches. However, buildings exist discretely, unlike segmented parcels or patches. Therefore, comparing graphs created by designating the centroid of buildings as nodes is challenging when temporal images contain positional differences between identical buildings. Furthermore, other existing graph-based CD frameworks (Kalinicheva et al. Citation2020; N. Wang et al. Citation2022) constructed graphs after completing CD, i.e. their graphs were configured based on the extracted changes or patterns for further analysis. Therefore, these graph models cannot be applied to the proposed approach, which utilizes the connectivity information between the objects stored in graphs as a criterion for CD.

To overcome the aforementioned limitations, this study proposes a hybrid framework for building CD that combines a deep learning process for building detection with graph structure comparison for CD. The major contributions of this study are stated as follows:

The proposed approach considered the spatial relationships (connectivity between adjacent buildings) between building nodes instead of comparing the pixels of bitemporal images. Thus, compared to the existing CD methods significantly affected by domain shift, the proposed method can be more effectively applied to VHR images containing geometric inconsistencies.
The data shortage issue for CD was addressed by separating the subprocesses of building detection from VHR images and CD extraction. To this end, existing building segmentation sources such as OpenStreetMap, Google Map, and Gaode Map, which provide several available building labels, can be used (Cao and Huang Citation2023).
The proposed iterative approach of graph comparison was automated to enable efficient CD identification without user intervention. Furthermore, the period in which the changed building nodes exist could be specified, because the buildings were detected and the corresponding graphs were configured for each period. Therefore, the proposed method produced binary CD maps and could automatically classify new or demolished buildings between bitemporal images.

The remainder of this paper is structured as follows: the methodology of building CD is presented in Section 2. Thereafter, the experimental results and limitations of this study are discussed in Section 3. Finally, the conclusions of this study are summarized in Section 4.

2. Building CD framework

To address the challenges of CD considering positional inconsistency in VHR images, the proposed building CD framework was developed with the following characteristics.

Determination of changed buildings via two-step hybrid approach: identification of building by applying a deep learning-based object detection method and extraction of the added and deleted buildings using graph comparison.
Use of connectivity information between adjacent buildings by considering the characteristics of buildings that forms a cluster with their surrounding buildings.

As depicted in , the proposed CD framework comprises two steps. In step 1, a deep learning network extracts the centroids of the building objects in the bitemporal images as nodes. For each building object in the bitemporal images, bounding boxes and segmentation masks are generated by applying Mask R-CNN, and the building nodes are identified from the centroid of each bounding box. In step 2, the building changes are detected by iterative graph comparisons after constructing graphs for each timeframe. The Delaunay triangulation, a common graph model in the GIS field, can be used to configure building graphs (Yan et al. Citation2019). Using the nodes generated in step 1, building graphs are created for two timeframes based on the Delaunay triangulation. In particular, nodes in which the triangular graph shape formed between surrounding buildings is different, and those nodes exhibit a significant positional difference between two periods concurrently are extracted as changed buildings in the iterative graph comparison process. After identifying the changed building nodes, CD maps are created using the segmentation masks generated by Mask R-CNN. Details of the proposed framework are described in the following sections.

Figure 1. Proposed building change detection framework.

2.1. Building detection using Mask R-CNN

Building detection in VHR images is crucial in remote-sensing fields for performing tasks such as land mapping and post-disaster reconstruction (Han et al. Citation2022). There are various deep learning-based methods for building detection (Xu et al. Citation2018; Li et al. Citation2021; Yu et al. Citation2022; Z. Wang et al. Citation2022). In particular, R-CNN (Girshick et al. Citation2014), Fast R-CNN (Girshick Citation2015), and Faster R-CNN (Ren et al. Citation2017) were used, which are powerful deep learning network-based models, for segmenting building objects by predicting the bounding boxes of targets and segmentation results. Earlier, Mask R-CNN (He et al. Citation2017) has been successfully applied to extract building objects from VHR images (Zhao et al. Citation2018; Han et al. Citation2022; Y. Wang et al. Citation2022; Usmani, Napolitano, and Bovolo Citation2023), and it has been widely used as an instance segmentation model for building detection (Liu et al. Citation2021). Previous studies (Zhao et al. Citation2018; Usmani, Napolitano, and Bovolo Citation2023) have effectively extracted buildings with Mask R-CNN, while sharing their trained weights. As these pre-trained weights can be utilized to increase the learning efficiency, we adopted Mask R-CNN to detect building objects. The pre-trained Mask R-CNN employed in this study were trained on the Spacenet v2 building dataset and the model has already been extracted with high accuracy.

This study detects building objects using SpaceNet 2: Building Detection v2, which is a Mask R-CNN trained on VHR building datasets acquired from Shanghai, Paris, Khartoum, and Las Vegas (Van Etten, Lindenbaum, and Bacastow Citation2018). The center of the bounding box was extracted as a node. The Mask R-CNN with ResNet-50 backbone was accomplished using an online GPU and cloud services provided by Google Colab with Pytorch. Among the images provided in the SpaceNet building dataset, we randomly selected RGB-Pan Sharpened input images (size: 650 pixel × 650 pixel) with three bands. The number of training and validation images were 3000 and 1000, respectively ().

Table 1. Detailed information of training, validation, and test datasets.

Download CSV Display Table

Furthermore, pretrained weights available for Mask R-CNN trained on the Microsoft Common objects in Context dataset (COCO) (http://cocodataset.org/#home) and SpaceNet building datasets (https://github.com/Mstfakts/Building-Detection-MaskRCNN) were used and trained for 151 epochs for all parts of the model. Based on the initial weights, additional training was conducted for 100 epochs. We followed the general approach for implementing Mask R-CNN and adopted most of the hyperparameters to train on the COCO dataset. For instance, MINI MASK SHAPE and MASK SHAPE were set to [56, 56] and [28, 28], respectively, to improve training speed. Furthermore, the number of classes was 2 and batch size was set to 1. However, the hyperparameters RPN ANCHOR RATIOS and RPN ANCHOR SCALES were set to [0.25, 1, 4] and [8, 16, 32, 64, 128], which were experimentally determined and considered suitable for building images.

As the SpaceNet building dataset for building detection was not constructed in a time series, the CD method could not be easily tested on temporal images. Therefore, six simulated images were generated by arbitrarily synthesizing the building changes using SpaceNet data ().

The building roof in the original image was removed using an image editing program, and the corresponding pixels were replaced with brightness values similar to those of the surrounding pixels. Furthermore, to emphasize the domain shift in bitemporal images, we deliberately induced positional displacements by shifting the center of the simulated image. The shift error is one of the most representative geometric errors (Dave, Joshi, and Srivastava Citation2015) contributing to false CD detections in VHR temporal images. Thereafter, the generated images – same size as the original images – were aligned in rows and columns, and displacements up to a maximum length of 15 pixels were randomly introduced.

The image center displaced from $A$ to $A^{'}$ is portrayed in . The CD results obtained using CVA in the absence and presence of positional difference are illustrated in , respectively. Without positional displacement, the building changes could be extracted using the pixel-based CD method; however, with positional error, the building changes could not be appropriately detected in the same images. These results confirmed the significant effect of positional inconsistencies on CD results.

Figure 2. Positional displacement to emphasize geometric variations: (a) image shifting example and CD results between the original and simulated image (b) without positional displacement and (c) with positional displacement.

2.2. CD using building graph comparison

In an image with shift errors, the building location, even for identical buildings, may not be consistent across multiple time frames owing to the differences in the camera position and viewing angle, i.e. the precise location of buildings in VHR temporal images cannot be guaranteed. Therefore, building changes cannot be detected solely by overlapping the detected buildings in two different temporal images. In addition, the connections between adjacent buildings change due to the construction or demolition of buildings. Therefore, an area with building changes can be identified by detecting variations in the connectivity between the buildings. As most buildings exist in clusters (Li and Wen Citation2017), a graph can be configured based on the connectivity between the adjacent buildings. Specifically, each building is abstracted as a node, and their connectivity with adjacent buildings are abstracted as links. For each period, undirected building graphs are created using Delaunay triangulation, and nodes are added to the four corners of VHR to prevent connections with distant outermost building nodes that cannot be regarded as neighbors. Thereafter, the building graph is created, including these corner nodes. The graphs from varying periods are compared to detect changes in the connectivity of a building with neighboring buildings.

New connections with neighboring buildings appear when a new building is constructed, whereas those between the existing buildings change when any building is demolished. Therefore, the structure of a building graph inevitably changes with the changed buildings. Accordingly, the added or deleted buildings can be extracted by detecting their nodes in the changed graph area. The change index $C_{a}$ indicates that a smaller value corresponds to a greater likelihood of changes in building, which is defined using a linear combination of graph and positional similarity as follows:

C_{a} = w_{1} [\frac{1}{N_{a}} \sum_{i = 1}^{N_{a}} m a x {\{S_{i j} + O_{i j}\}}_{j = 1}^{M}] + w_{2} \cdot m a x {\{P_{a b}\}}_{b = 1}^{V}

(1)

\forall a = 1, \dots, U

(1)

where $U$ and $V$ denote the total number of nodes in the target and compared periods, respectively (if T1 denotes the target period, then T2 is the compared period, and vice-versa), $N_{a}$ denotes the number of intersecting triangles for node $a$ , and $M$ indicates the total number of triangles in the compared period. The shape ratio similarity $S_{i j}$ , overlap similarity $O_{i j}$ , and positional similarity $P_{a b}$ are calculated according to Kim and Yu (Citation2015). Note that $S_{i j}$ = $1 - D i f f_{s_{i j}} / m a x (D i f f_{s_{a l l}})$ , where $D i f f_{s_{i j}} = |p e r i m e t e r_{i} / (2 \sqrt{π \cdot a r e a_{i}}) - p e r i m e t e r_{j} / (2 \sqrt{π \cdot a r e a_{j}})|$ . $O_{i j}$ = $1 - D i f f_{o_{i j}} / m a x (D i f f_{o_{a l l}})$ , where $D i f f_{o_{i j}} = |(a r e a_{i \cup^{j}} - a r e a_{i \cap^{j}}) / (a r e a_{i} + a r e a_{j})|$ . $P_{a b}$ = $1 - D i f f_{p_{a b}} / m a x (D i f f_{p_{a l l}})$ , where $D i f f_{p_{a b}} = \sqrt{{(X_{a} - X_{b})}^{2} + {(Y_{a} - Y_{b})}^{2}}$ . The shape ratio of polygons can be expressed using their perimeter and area. As $D i f f_{s_{i j}}$ decreased for more similar shapes, $S_{i j}$ represents the similarity in the shapes of the two polygons. Similarly, two objects will entirely overlap if they are identical. Therefore, $O_{i j}$ , which is an index indicating the overlapping of two objects, increases for similar objects. Furthermore, a shorter distance between points $a$ and $b$ corresponds to a higher positional similarity $P_{a b}$ .

The graph similarity (GS), expressed in square brackets in Equationequation (1)(1) $\forall a = 1, \dots, U$ (1) , indicates the similarity of the subgraph of building $a$ in the target period with that of the matching-candidate building in the compared period. The graph-matching problem involves determining pairs of corresponding nodes (Feng et al. Citation2013). Specifying the corresponding node pairs becomes challenging when graph pairs contain geometric errors. Therefore, instead of specifying node pairs, GS is calculated by comparing triangles configured by the connections between the nodes. When a building is deleted, neighboring building nodes form new triangles with other nodes. When a building is created, adjacent building nodes form a new triangle with the new building node, sometimes even by terminating their existing connections. As building nodes are deleted or created, the triangles formed by the building nodes vary as well. Therefore, the building GS is determined by evaluating the similarity of the triangles in the target and compared periods, referring to the polygon similarity index presented by Kim and Yu (Citation2015). Unlike earlier methods of polygon similarity, GS only combines the shape ratio and overlap similarities; notably, it disregards positional similarity to separately reflect the positional difference between the building nodes. As matching pairs between the triangles in the target and compared periods are unspecified, the maximum similarity between triangle $i$ in the target period and $j$ in the compared period ( $m a x {\{S_{i j} + O_{i j}\}}_{j = 1}^{M}$ in Equationequation (1)(1) $\forall a = 1, \dots, U$ (1) ) is defined as GS. Moreover, each building can configure a set of multiple triangles based on its positional relationship with the adjacent buildings. Therefore, the GS of building $a$ is determined as the average similarity value of all triangles containing the node $a$ . Finally, the GS is linearly combined with the positional similarity between a candidate pair of identical buildings in two different timeframes, separated by the minimum distance ( $m a x {\{P_{a b}\}}_{b = 1}^{V}$ in Equationequation (1)(1) $\forall a = 1, \dots, U$ (1) ).

To identify the changed building nodes using iterative graph comparison, two building graphs in T1 and T2 are overlaid for comparison; then, $C_{a}$ is evaluated using Equationequation (1)(1) $\forall a = 1, \dots, U$ (1) , as depicted in . As the total number of buildings in the two periods may differ, $C_{a}$ is calculated for each period. Initially, the $C_{a}$ of all nodes in T1 is calculated considering T1 and T2 as the target and compared periods, respectively. Thereafter, the $C_{a}$ of all nodes in T2 is calculated using T2 as the target node. The building with the smallest $C_{a}$ between both periods is identified as a changed building. If it exists in T1, then it is classified as a deletion; otherwise, it is classified as an addition. The following process is repeated until all building changes are detected: the building with the smallest $C_{a}$ is deleted (e.g. node “A” in ), the graph is reconstructed and $C_{a}$ is reevaluated. If the corner nodes bear the smallest $C_{a}$ , they are excluded from deletion during the iterative process. The building with the smallest $C_{a}$ , which makes the two temporal graphs different, is removed at each stage. As the graphs of the two periods become increasingly similar, the average $C_{a}$ increases, and the standard deviation of $C_{a}$ decreases. However, the similarity between the two graphs decreases when the buildings are incorrectly deleted. Thus, in case of an unnecessary deletion after detecting all building changes, the average $C_{a}$ decreases and the standard deviation of $C_{a}$ increases. Therefore, the iteration is executed until the stage just before the average $C_{a}$ starts to decrease and the standard deviation of $C_{a}$ starts to increase. To determine the end stage of iteration, the average and standard deviation of all $C_{a}$ values from buildings in T1 and T2 are calculated independent of timeframes. For instance, if T1 contains two buildings and T2 contains three with the newly constructed building, the average and standard deviation of $C_{a}$ from all five buildings are calculated, and the sum of $C_{a}$ values for all buildings in T1 and T2 is divided by the total number of buildings in T1 and T2 to calculate the average $C_{a}$ . Because only a single changed building is detected at each stage, all extracted buildings are merged with relation to their timeframes to be classified as deleted or added buildings. Consequently, all extracted building changes in T1 and T2 are considered to be deleted and newly constructed buildings, respectively. The iterative graph comparison process is represented in .

Figure 3. Descriptive illustration of graph comparison (building graph in T1 is plotted with solid lines and circle nodes, and the graph in T2 is represented with dotted lines and diamond nodes).

Figure 4. Proposed iterative graph comparison process for building change detection ( $C_{a}$ denotes the change index; T1 and T2 represent two distinct periods).

3. Experimental results

3.1. Test sites

The building graph structure is influenced by the inherent characteristics of the buildings (e.g. size and shape) and their distribution (e.g. clustered distribution, distance from neighboring buildings, and the number of buildings in an image), which may consequently affect the performance of the proposed building CD framework. Herein, we selected six sites with varying numbers of buildings, shapes, and distribution characteristics to apply the proposed framework and comparatively evaluate its CD performance. Specifically, sites A and B contained more buildings than sites C – F. Site A housed relatively uniform-sized buildings, whereas site B comprised several large buildings. Sites C and E were covered by a small number of sparsely located buildings. Notably, the buildings in site E occupied a miniscule portion of the image (i.e. the proportion of building pixels to total pixels of the image was extremely low) compared to those in site C. The buildings formed clusters at sites D and E; particularly, the former included building clusters in blocks along roads. Site F contained large and long buildings, with longer distances between adjacent buildings (i.e. the distance between building centroids).

Based on the selected test images, synthetic images were generated, as depicted in . The six test sets contained the building changes, and the T2 images included the positional displacement. Most changed buildings were reflected by removing buildings from T2 images; however, for site E, buildings were simultaneously added and deleted. The ground-truth images represent the building changes between two periods.

Figure 5. Test sets and ground truth for sites (a) A, (b) B, (c) C, (d) D, (e) E, and (f) F. Each site exhibits the original and simulated images. Images on the first and second rows were obtained from periods T1 and T2, respectively.

3.2. Results of building detection

Mask R-CNN was applied to the original and simulated images obtained from six sites, including building changes and center position displacement (). Most buildings were extracted using Mask R-CNN, and the center of the bounding box was extracted as a building node. However, false detections occurred when two independent buildings were recognized as a single building () and their centroid was a single extracted node. As the performance of deep learning networks (e.g. Mask R-CNN) depends on the amount of training data, the false detection rate can be reduced by increasing the extent of training data for various building types.

Figure 6. Building detection results for sites (a) A, (b) B, (c) C, (d) D, (e) E, and (f) F. The top and bottom images were obtained from periods T1 and T2, respectively.

3.3. Building CD results

3.3.1 Experimental specification

The centroids of the buildings detected by the trained model were extracted as nodes and four corner points of the image were added. Thereafter, a building graph was created using Delaunay triangulation. Finally, the proposed comparison process was applied to the building graphs constructed from the images captured at two distinct periods to detect the added or deleted buildings.

In Equationequation (1)(1) $\forall a = 1, \dots, U$ (1) , $w_{1}$ and $w_{2}$ denote the weights of building graphs’ similarity and positional similarity of the building nodes, respectively. Specifically, if $w_{1}$ > $w_{2}$ (i.e. when the influence of GS is more prominent than positional similarity), the structural differences are more sensitively captured in the graph formed by buildings. Conversely, with a higher value of $w_{2}$ , the positional difference for the same building (which should be in the identical location) between two temporal images is reflected more significantly. Therefore, it is more effective to adjust the ratio of these two similarities (i.e. graph and positional) based on the degree of positional inconsistency in the input images instead of using identical weights. Herein, we tested three weight combinations with varying ratios to evaluate the influence of $w_{1}$ and $w_{2}$ , while determining the optimal value for the test set, as depicted in . For identical weight values, the iteration was terminated although different graph regions persisted (i.e. changed building remains), as indicated by region A in . Thus, the influence of GS should be improved by increasing $w_{1}$ . Contrastingly, if the influence of positional similarity was greater than that of GS, an identical building node pair with a slightly different position was incorrectly detected as a changed building (region B in ). The positional difference between the identical building nodes in the two periods can be significant for VHR images with shifts. Thus, the influence of positional similarity reflected in $w_{2}$ increases excessively, which is inappropriate. Consequently, for the test set, $w_{1}$ and $w_{2}$ in Equationequation (1)(1) $\forall a = 1, \dots, U$ (1) were empirically set as 2/3 and 1/3, respectively.

Figure 7. Final result after completing the iteration process with various weights: (a) $w_{1} = 1 / 2 a n d w_{2} = 1 / 2$ , (b) $w_{1} = 1 / 3 a n d w_{2} = 2 / 3$ , (c) $w_{1} = 2 / 3 a n d w_{2} = 1 / 3$ .

At each stage, the building with the smallest $C_{a}$ value was extracted as a changed building. The results of the iterative graph comparison along with the average score and standard deviation in each stage for site C are plotted in , respectively. Although the graph structures of two periods varied in areas with building changes, their similarity increased as the changed buildings were deleted with the progressing iteration (). As depicted in , the iteration ended at the inflection point on the average and standard deviation curves, and the nodes removed at each stage were defined as changed buildings.

Figure 8. Changed building node extraction in each iteration stage.

Figure 9. Changes in average score and standard deviation by stage (red circle represents the end point of iteration).

3.3.2 Evaluation of experimental results

Furthermore, to analyze the performance of the proposed method, we calculated the overall accuracy, omission, and commission errors depending on whether the changing or non-changing buildings were extracted. The accuracy measures of the proposed CD method are summarized in . The overall accuracy for sites A – F was 1, 0.5, 0.5, 0.7, 1, and 0.5, respectively. For sites A and E, an overall accuracy of 1 indicates that all the changed buildings in the image were accurately extracted as added or deleted buildings. For sites B, C, and F, all changed buildings were extracted; however, several unchanged buildings were included as changed buildings, thereby increasing the commission error to 0.5. At site D where the highest number of buildings were changed, all the extracted buildings were changed buildings, but several changed buildings were not considered in the final detection. Notably, the omission and commission errors were predominantly caused by the errors induced in the building detection phase. Although the building existed across both periods, these errors were induced because of considering only one period for building extraction.

Table 2. Error matrix of building CD.

Download CSV Display Table

Moreover, we performed pixel-based accuracy assessment for comparison with other algorithms. The results generated by various CD methods are comprehensively presented in and the results of the pixel-based CD accuracy assessment are listed in . To demonstrate the effectiveness of the proposed method, we used representative pixel-based methods such as CVA and the Iteratively Reweighted Multivariate Alteration Detection (IR-MAD). Furthermore, the reclassified CD results of IR-MAD into object units were compared following the OBIA approach. Additionally, as an ablation study, CDs were generated using only the Mask R-CNN results of bitemporal images to validate the effectiveness of graph application. The segmentation masks of the building objects corresponding to the changed node were extracted as the final CD results for comparing the performances of various methods. In this experiment, we comparatively analyzed various post-classification approaches for CD. Although various deep learning networks are available for CD, their results cannot be compared for the same conditions because this study used a different dataset for change and building detection. For instance, Mask R-CNN was trained using the SpaceNet building segmentation dataset.

Figure 10. Binary change maps generated by various methods such as CVA, IR-MAD, IR-MAD with OBIA, Mask R-CNN, and the proposed method for sites a to F.

After removing three buildings at site A, the building changes were detected by the proposed method. As listed in , the F1 score of the proposed method was greater than that of other methods. As positional displacement occurred between temporal images, the changes are difficult to detect using pixel-based methods. The CD results obtained using Mask R-CNN exhibited higher recall because all the buildings segmented as changed buildings were extracted. However, various pseudo-changed pixels existed because of positional displacement, which decreased the precision in all experimental images. No actual changes were observed at sites B and C; however, the proposed method extracted several nodes as changed nodes because a few buildings were extracted from a one-period image despite its occurrence throughout both periods. Although the accuracies of site C were relatively low compared to other sites owing to false detection errors in the building extraction stage, the proposed method significantly outperformed other algorithms (). At site D where several buildings were removed, the proposed method could detect 12 out of the 17 buildings changed. The graph structure varied significantly after deleting multiple buildings, and five buildings were undetected because of the early termination of the iteration process determining the changed nodes. As the buildings at site E were relatively smaller than those at other sites, the deleted and added buildings could observed in the T2 image owing to the absence of multiple buildings in the image. Therefore, the proposed method could effectively extract the buildings. Site F contained one removed building in the middle of the T2 image, and the proposed method extracted the changed building. Although the building in the upper left corner was detected as the changed building, it was not actually the changed building. This false detection occurred because of the detection error in the building extraction step, wherein the building in the upper left corner was extracted as two nodes. Note that, in this case, the changed node was properly extracted in the graph comparison phase. However, buildings located in the corners may not be accurately detected if the shape of the building is partially missing.

For all test sites, the proposed method outperformed all other algorithms in terms of precision, recall, F1 score, and Kappa coefficient. As the test sets included building changes and differences in location, the location of a building existing in the two periods varied even if the building was unchanged. Therefore, comparison methods such as CVA and IR-MAD exhibited low precision and higher recall, as they identify changes based on pixels representing pseudo-changes. In particular, the comparative analysis of the extraction efficiency of Mask R-CNN highlighted the significance of incorporating the graph comparison stage. The buildings of each period were correctly extracted using Mask R-CNN, and it can be confirmed that unchanged pixels were extracted as changed pixels owing to positional inconsistency.

Although the performances of various deep learning algorithms have been demonstrated on benchmark datasets such as LEVIR-CD and the WHU building dataset, their performance on temporal images with positional displacement remain unverified. Specifically, deep learning algorithms involving the process of extracting building shapes and comparing edges (Ye et al. Citation2022; Zheng et al. Citation2022) tend to extract pseudo-changes in multitemporal images with positional inconsistencies. As various satellite sensors have been developed, multitemporal images acquired from various sensors may display positional inconsistency in building roofs owing to relief displacement. Therefore, the proposed method, which extracts changed nodes using graph comparison, can be applied to multitemporal images obtained from various sensors, if the accuracy of building detection is confirmed.

3.4. Limitations

Although the proposed CD framework is intuitive and effective, it poses several limitations when applied to various CD scenarios. First, when the building changes occupy the center of a target area, the overlapping area is sufficiently large. Even if the structural difference in the graph is recognizable, the GS values decrease only marginally, which limits the detection capability. Second, if numerous building changes appear, similar to that at site D, the initial two graph structures vary considerably. Therefore, depending on the detection order of building changes, the iteration may be inappropriately interrupted, thereby increasing the probability of a CD error. Third, if no building changes exist, then buildings that are consistent across two different periods may be concurrently detected as a deletion case in T1 and creation case in T2. Therefore, the proposed framework must be improved by establishing a criterion to determine whether any changes have been introduced before applying the CD process. As the proposed framework comprises two separate processes, the errors induced in the building detection stage can propagate to the graph comparison stage. For instance, two adjacent buildings were extracted as a single building or a single building was detected as two independent buildings. Contrarily, when detection error occurred in a single image, the difference was deemed as a change. Nonetheless, these limitations can be overcome by additional learning using a large amount of data for more diverse regions.

4. Conclusions

This study proposed a hybrid framework for building CD by combining deep learning-based building detection and change extraction using graph comparison. The centroids of buildings from VHR imagery are extracted through a trained Mask R-CNN, and the building graph is constructed using Delaunay triangulation. A similarity-based iterative comparison automatically detects the regions with structural differences between the two periods. The deletions and additions of buildings are determined based on the period in which the changes are detected.

The experimental results demonstrated that the proposed method could overcome the limitations of the existing approaches. For domain-shifted synthetic images, the proposed method can detect removed buildings without pseudo-changes. The method is practical for VHR imagery with geometric errors because it reflects the topological relationship between each building and its neighbors using graph models, without requiring additional data and preprocessing for relief adjustment in the building CD process. Furthermore, the shortage of CD datasets can be addressed by separating the building detection and change extraction processes. In future, if VHR images for building segmentation could be accumulated over a substantial period, the proposed building CD method can be further improved by comparing or adopting the state-of-the-art deep learning CD models, which is expected to increase the utility of the proposed method.

Acknowledgments

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea NRF), funded by the Ministry of Education (2022R1F1A1063254, 2021R1A6A3A01086427).

Disclosure statement

No potential conflict of interest was reported by the authors.

Data availability statement

The data supporting the findings of this study are available from the corresponding author, Ahram Song, upon reasonable request. SpaceNet 2: Building Detection v2 can be downloaded https://spacenet.ai/spacenet-buildings-dataset-v2/

Additional information

Funding

The work was supported by the National Research Foundation of Korea [2022R1F1A1063254]

References

Cao, Y., and X. Huang. 2023. “A Full-Level Fused Cross-Task Transfer Learning Method for Building Change Detection Using Noise-Robust Pretrained Networks on Crowdsourced Labels.” Remote Sensing of Environment 284: 113371. https://doi.org/10.1016/j.rse.2022.113371.
Web of Science ®Google Scholar
Chen, H., and Z. Shi. 2020. “A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image Change Detection.” Remote Sensin 12 (10): 1662. https://doi.org/10.3390/rs12101662.
Web of Science ®Google Scholar
Chen, H., K. Zhang, W. Xiao, Y. Sheng, L. Cheng, W. Zhou, P. Wang, D. Su, L. Ye, and S. Zhang. 2021. “Building Change Detection in Very High-Resolution Remote Sensing Image Based on Pseudo-Orthorectification.” International Journal of Remote Sensing 42 (7): 2686–17. https://doi.org/10.1080/01431161.2020.1862437.
Web of Science ®Google Scholar
Chen, J., Z. Yuan, J. Peng, L. Chen, H. Huang, J. Zhu, Y. Liu, and H. Li. 2020. ““DASNet: Dual Attentive Fully Convolutional Siamese Networks for Change Detection in High-Resolution Satellite Images.” ” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14: 1194–1206. https://doi.org/10.1109/JSTARS.2020.3037893.
Web of Science ®Google Scholar
Dave, C. P., R. Joshi, and S. S. Srivastava. 2015. “A Survey on Geometric Correction of Satellite Imagery.” International Journal of Computer Applications 116 (12): 24–27. https://doi.org/10.5120/20389-26553.
Google Scholar
Esfandiari, M., G. Abdi, S. Jabari, and V. S. P. Lolla. 2021. “Building Change Detection in Off-Nadir Images Using Deep Learning” In IGARSS 2021 IEEE, 1347–1350. https://doi.org/10.1109/IGARSS47720.2021.9553172. IEEE Publications.
Google Scholar
Feng, W., Z. Q. Liu, L. Wan, C. M. Pun, and J. Jiang. 2013. “A Spectral-Multiplicity-Tolerant Approach to Robust Graph Matching.” Pattern Recognition 46 (10): 2819–2829. https://doi.org/10.1016/j.patcog.2013.03.003.
Web of Science ®Google Scholar
Girshick, R. 2015. “Fast R-CNN.” Paper presented at the CVPR Workshops, Santiago, Chile, 13–16 December. pp. 1440–1448. https://doi.org/10.48550/arXiv.1504.08083.
Google Scholar
Girshick, R., J. Donahue, T. Darrell, and J. Malik. 2014. “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.” Paper presented at the CVPR Workshops, Columbus, OH, USA, 24–27 June. pp. 580–587. https://doi.org/10.48550/arXiv.1311.2524.
Google Scholar
Han, Q., Q. Yin, X. Zheng, and Z. Chen. 2022. “Remote Sensing Image Building Detection Method Based on Mask R-CNN.” Complex & Intelligent Systems 8 (3): 1847–1855. https://doi.org/10.1007/s40747-021-00322-z.
Web of Science ®Google Scholar
Han, Y., A. Javed, S. Jung, and S. Liu. 2020. “Object-Based Change Detection of Very High Resolution Images by Fusing Pixel-Based Change Detection Results Using Weighted Dempster–Shafer Theory.” Remote Sensing 12 (6): 983. https://doi.org/10.3390/rs12060983.
Web of Science ®Google Scholar
He, K., G. Gkioxari, P. Dollar, and R. Girshick. 2017. “Mask R-CNN” In IEEE International Conference on Computer Vision, 2961–2969. https://doi.org/10.48550/arXiv.1703.06870. IEEE Publications.
Google Scholar
Huang, D., Y. Tang, and R. Qin. 2022. “An Evaluation of Planetscope Images for 3D Reconstruction and Change Detection–Experimental Validations with Case Studies.” GIScience & Remote Sensing 59 (1): 744–761. https://doi.org/10.1080/15481603.2022.2060595.
Web of Science ®Google Scholar
Huang, X., L. Zhang, and T. Zhu. 2014. “Building Change Detection from Multitemporal High-Resolution Remotely Sensed Images Based on a Morphological Building Index.” IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing 7 (1): 105–115. https://doi.org/10.1109/JSTARS.2013.2252423.
Web of Science ®Google Scholar
Hussain, M., C. Dongmei, C. Angela, W. Hui, and S. David. 2013. “Change Detection from Remotely Sensed Images: From Pixel-Based to Object-Based Approaches.” Isprs Journal of Photogrammetry & Remote Sensing 80: 91–106. https://doi.org/10.1016/j.isprsjprs.2013.03.006.
Web of Science ®Google Scholar
Ji, S., Y. Shen, M. Lu, and Y. Zhang. 2019. “Building Instance Change Detection from Large-Scale Aerial Images Using Convolutional Neural Networks and Simulated Samples.” Remote Sensing 11 (11): 1343. https://doi.org/10.3390/rs11111343.
Web of Science ®Google Scholar
Ji, S., S. Wei, and M. Lu. 2019. “Fully Convolutional Networks for Multisource Building Extraction from an Open Aerial and Satellite Imagery Data Set.” IEEE Transactions on Geoscience and Remote Sensing: A Publication of the IEEE Geoscience and Remote Sensing Society 57: 574–586. https://doi.org/10.1109/TGRS.2018.2858817.
Google Scholar
Jiang, H., M. Peng, Y. Zhong, H. Xie, Z. Hao, J. Lin, X. Ma, and X. Hu. 2022. “A Survey on Deep Learning-Based Change Detection from High-Resolution Remote Sensing Images.” Remote Sensing 14 (7): 1552. https://doi.org/10.3390/rs14071552.
Web of Science ®Google Scholar
Kalinicheva, E. D., D. Ienco, J. Sublime, and M. Trocan. 2020. “Unsupervised Change Detection Analysis in Satellite Image Time Series Using Deep Learning Combined with Graph-Based Approaches.” IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing 13: 1450–1466. https://doi.org/10.3390/rs1407155210.1109/JSTARS.2020.2982631.
Web of Science ®Google Scholar
Kim, J., and K. Yu. 2015. “Areal Feature Matching Based on Similarity Using CRITIC Method.” International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences XL-2/W4: 75–78. https://doi.org/10.5194/isprsarchives-XL-2-W4-75-2015.
Google Scholar
Li, X., and J. Wen. 2017. “Net-Zero Energy Building Clusters Emulator for Energy Planning and Operation Evaluation.” Computers, Environment and Urban Systems 62: 168–181. https://doi.org/10.1016/j.compenvurbsys.2016.09.007.
Web of Science ®Google Scholar
Li, Z., Q. Xin, Y. Sun, and M. Cao. 2021. “A Deep Learning-Based Framework for Automated Extraction of Building Footprint Polygons from Very High-Resolution Aerial Imagery.” Remote Sensing 13 (18): 3630. https://doi.org/10.3390/rs13183630.
Web of Science ®Google Scholar
Liu, J., W. Xuan, Y. Gan, Y. Zhan, J. Liu, and B. Du. 2022. “An End-To-End Supervised Domain Adaptation Framework for Cross-Domain Change Detection.” Pattern Recognition 132: 108960. https://doi.org/10.1016/j.patcog.2022.108960.
Web of Science ®Google Scholar
Liu, Y., D. Chen, A. Ma, Y. Zhong, F. Fang, and K. Xu. 2021. “Multiscale U-Shaped CNN Building Instance Extraction Framework with Edge Constraint for High-Spatial-Resolution Remote Sensing Imagery.” IEEE Transactions on Geoscience & Remote Sensing 59: 6106–6120. https://doi.org/10.1109/TGRS.2020.3022410.
Web of Science ®Google Scholar
Pham, M., G. Mercier, and J. Michel. 2016. “Change Detection Between SAR Images Using a Pointwise Approach and Graph Theory.” IEEE Transactions on Geoscience & Remote Sensing 54 (4): 2020–2032. https://doi.org/10.1109/TGRS.2015.2493730.
Web of Science ®Google Scholar
Ren, S., K. He, R. Girshick, and J. Sun. 2017. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.” IEEE Transactions on Pattern Analysis & Machine Intelligence 39 (6): 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031.
PubMed Web of Science ®Google Scholar
Sun, Y., L. Lei, X. Li, X. Tan, G. Kuang, X. Li, B. Zhang, and A. Plaza. 2022. “Structure Consistency-Based Graph for Unsupervised Change Detection with Homogeneous and Heterogeneous Remote Sensing Images.” IEEE Transactions on Geoscience & Remote Sensing 60 (60): 1–16. https://doi.org/10.1109/TGRS.2021.3053571.
Google Scholar
Tian, J., S. Cui, and P. Reinartz. 2014. “Building Change Detection Based on Satellite Stereo Imagery and Digital Surface Models.” IEEE Transactions on Geoscience & Remote Sensing 52 (1): 406–417. https://doi.org/10.1109/TGRS.2013.2240692.
Web of Science ®Google Scholar
Usmani, M., M. Napolitano, and F. Bovolo. 2023. “Towards Global Scale Segmentation with OpenStreetmap and Remote Sensing.” ISPRS Open Journal of Photogrammetry and Remote Sensing 8: 100031. https://doi.org/10.1016/j.ophoto.2023.100031.
Google Scholar
Van Etten, A., D. Lindenbaum, and T. M. Bacastow. 2018. ‘SpaceNet: A Remote Sensing Dataset and Challenge Series.’. arXiv Preprint ArXiv:1807.01232. https://doi.org/10.48550/arXiv.1807.01232.
Google Scholar
Wang, N., W. Li, R. Tao, and Q. Du. 2022. “Graph-Based Block-Level Urban Change Detection Using Sentinel-2 Time Series.” Remote Sensing of Environment 274: 112993. https://doi.org/10.1016/j.rse.2022.112993.
Web of Science ®Google Scholar
Wang, Q., X. Zhang, G. Chen, F. Dai, Y. Gong, and K. Zhu. 2018. “Change Detection Based on Faster R-CNN for High-Resolution Remote Sensing Images.” Remote Sensing Letters 9 (10): 923–932. https://doi.org/10.1080/2150704X.2018.1492172.
Web of Science ®Google Scholar
Wang, Y., S. Li, F. Teng, Y. Lin, M. Wang, and H. Cai. 2022. “Improved Mask R-CNN for Rural Building Roof Type Recognition from UAV High-Resolution Images: A Case Study in Hunan Province, China.” Remote Sensing 14 (2): 265. https://doi.org/10.3390/rs14020265.
Web of Science ®Google Scholar
Wang, Z., N. Xu, B. Wang, Y. Liu, and S. Zhang. 2022. “Urban Building Extraction from High-Resolution Remote Sensing Imagery Based on Multi-Scale Recurrent Conditional Generative Adversarial Network.” GIScience & Remote Sensing 59 (1): 861–884. https://doi.org/10.1080/15481603.2022.2076382.
Web of Science ®Google Scholar
Wu, J., B. Li, Y. Qin, W. Ni, and H. Zhang. 2021. “An Object-Based Graph Model for Unsupervised Change Detection in High Resolution Remote Sensing Images.” International Journal of Remote Sensing 42 (16): 6209–6227. https://doi.org/10.1080/01431161.2021.1937372.
Web of Science ®Google Scholar
Xia, L., J. Chen, J. Luo, J. Zhang, D. Yang, and Z. Shen. 2022. “Building Change Detection Based on an Edge-Guided Convolutional Neural Network Combined with a Transformer.” Remote Sensing 14 (18): 4524. https://doi.org/10.3390/rs14184524.
Web of Science ®Google Scholar
Xiao, P. F., M. Yuan, X. L. Zhang, X. Z. Feng, and Y. W. Guo. 2017. “Cosegmentation for Object-Based Building Change Detection from High-Resolution Remotely Sensed Images.” IEEE Transactions on Geoscience & Remote Sensing 55 (3): 1587–1603. https://doi.org/10.1109/TGRS.2016.2627638.
Web of Science ®Google Scholar
Xu, Y., L. Wu, Z. Xie, and Z. Chen. 2018. “Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters.” Remote Sensing 10 (1): 144. https://doi.org/10.3390/rs10010144.
Web of Science ®Google Scholar
Yan, X., T. Ai, M. Yang, and H. Yin. 2019. ““A Graph Convolutional Neural Network for Classification of Building Patterns Using Spatial Vector Data.” Isprs Journal of Photogrammetry & Remote Sensing 150: 259–273. https://doi.org/10.1016/j.isprsjprs.2019.02.010.
Web of Science ®Google Scholar
Ye, Y., L. Zhou, B. Zhu, C. Yang, M. Sun, J. Fan, and Z. Fu. 2022. “Feature Decomposition-Optimization-Reorganization Network for Building Change Detection in Remote Sensing Images.” Remote Sensing 14 (3): 722. https://doi.org/10.3390/rs14030722.
Web of Science ®Google Scholar
Yu, B., F. Chen, N. Wang, L. Yang, H. Yang, and L. Wang. 2022. “MSFTrans: A Multi-Task Frequency-Spatial Learning Transformer for Building Extraction from High Spatial Resolution Remote Sensing Images.” GIScience & Remote Sensing 59 (1): 1978–1996. https://doi.org/10.1080/15481603.2022.2143678.
Web of Science ®Google Scholar
Zhao, K., J. Kang, J. Jung, and G. Sohn. 2018. “Building Extraction from Satellite Images Using Mask R-CNN with Building Boundary Regularization.” Paper presented at the CVPR Workshops, Salt Lake City, Utah. June 18-22. https://doi.org/10.1109/CVPRW.2018.00045.
Google Scholar
Zheng, H., M. Gong, T. Liu, F. Jiang, T. Zhan, D. Lu, and M. Zhang. 2022. “HFA-Net: High Frequency Attention Siamese Network for Building Change Detection in VHR Remote Sensing Images.” Pattern Recognition 129: 108717. https://doi.org/10.1016/j.patcog.2022.108717.
Web of Science ®Google Scholar

Hybrid approach using deep learning and graph comparison for building change detection

ABSTRACT

1. Introduction