684
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Leveraging involution and convolution in an explainable building damage detection framework

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Article: 2252166 | Received 03 Aug 2022, Accepted 22 Aug 2023, Published online: 05 Sep 2023

ABSTRACT

Timely and accurate building damage mapping is essential for supporting disaster response activities. While RS satellite imagery can provide the basis for building damage map generation, detection of building damages by traditional methods is generally challenging. The traditional building damage mapping approaches focus on damage mapping based on bi-temporal pre/post-earthquake dataset extraction information from bi-temporal images, which is difficult. Furthermore, these methods require manual feature engineering for supervised learning models. To tackle the abovementioned limitation of the traditional damage detection frameworks, this research proposes a novel building damage map generation approach based only on post-event RS satellite imagery and advanced deep feature extractor layers. The proposed DL based framework is applied in an end-to-end manner without additional processing. This method can be conducted in five main steps: (1) pre-processing, (2) model training and optimization of model parameters, (3) damage mapping generation, (4) accuracy assessment, and (5) visual explanations of the proposed method’s predictions. The performance of the proposed method is evaluated by two real-world RS datasets that include Haiti-earthquake and Bata-explosion. Results of damage mapping show that the proposed method is highly efficient, yielding an OA of more than 84%, which is superior to other advanced DL-based damage detection methods.

Introduction

A natural disaster is an extreme event that occurs within Earth’s system and can cause widespread destruction, substantial collateral damage, and even loss of life as the result of forces independent of human activities (Davies et al., Citation2021; Kapucu et al., Citation2022). Among the types of natural disasters, earthquakes are the deadliest typically causing thousands of deaths (Binici et al., Citation2022; Lin et al., Citation2022; Ünlü & Kiriş, Citation2021). Earthquake damage mapping is a critical analysis after the occurrence of each earthquake (Li et al., Citation2019), as it can provide valuable information about damages for such applications as relief and response as well as insurance company assessments (Cui et al., Citation2021). In urban areas, in particular, buildings are the main components damaged by earthquakes (Rupnik et al., Citation2018). Thus, accurate and timely building damage mapping is very important for managing the response and performing subsequent analyses (Jundullah & Wahyu Wijayanto, Citation2022; Weber et al., Citation2022).

Advancements in technology have increased interest in building damage mapping as a research topic (Cotrufo et al., Citation2018). Recently, an assessment of the building damage caused by earthquakes has been conducted based on the type of dataset utilized. Mainly, damage assessment is made based on light detection and ranging (Lidar) (Talreja et al., Citation2021), synthetic aperture radar (SAR) (Boloorani et al., Citation2021; ElGharbawi & Zarzoura, Citation2021; Wang et al., Citation2022), very high resolution (VHR) satellite imagery (D’Addabbo et al., Citation2022), and aerial unmanned aerial vehicle (UAV) imagery (Naito et al., Citation2020).

Due to the availability and simple interpretation of the VHR dataset, the use of the VHR dataset for mapping building damage is more convenient than other kinds of datasets. Thus, many damage detection methods have been developed in previous studies. The damage detection based on VHR data can be categorized into two main groups that include (a) damage detection based on a bi-temporal VHR dataset and (b) damage detection based on only a single post-event dataset.

The damage detection methods based on bi-temporal datasets are more popular in building damage mapping. These methods try to extract damage information from pre/post-event datasets by a change detection analysis. For instance, Gupta and Shah (Citation2020) designed a building damage mapping method based on a bi-temporal pre/post-event dataset called RescueNet. The RescueNet is based on ResNet50 and employs the Atrous spatial pyramid-pooling module for extracting multi-scale features. Furthermore, RescueNet segments the buildings and assesses individual damage levels simultaneously. Ji et al. (Citation2020) employ a pre-trained CNN model to map damaged buildings by incorporating a bi-temporal pre/post-earthquake VHR dataset. The pre-trained VGG-Net model is used to classify the deliberately collapsed buildings. The fine-tuned VGG-Net demonstrates superior performance to the model trained from scratch. Merlin and Wiselin Jiji (Citation2019) propose a strategy for mapping damage based on change detection on the pre/post-event VHR dataset. This framework follows several steps for detecting building damages: (1) building detection by the integration of color invariant and thresholding, (2) change detection by image differencing, (3) spectral and spatial feature extraction and, then, feature selection by a feature-ranking approach, and (4) classification by a feature-mean-ratio algorithm. Although these methods have obtained promising results in building damage detection, these methods are based on change detection by bi-temporal post/pre-event VHR dataset, and finding a pre-event dataset is sometimes a challenge. Furthermore, the result of other algorithms can be affected incorrectly by other conditions in a bi-temporal dataset (changes in a bi-temporal dataset can originate from other conditions) that can result in false alarms (Seydi & Hasanlou, Citation2021).

The damage detection methods by single post-event dataset extract damage patterns from the post-event dataset. Unlike, first-group methods, these methods do not require the pre-event dataset and some additional pre-processing (i.e. image-to-image registration). Since the pre-event dataset is not always available, these methods can be more effective for building damage assessment. To this end, some studies considered damage detection methods using only a single post-event dataset. For instance, Ünlü and Kiriş (Citation2021) designed a deep learning-based damage mapping framework using a convolutional neural network (CNN) and image segmentation. This framework is applied in two phases: (1) image segmentation by a k-means clustering method that segments the labels “damaged” and “non-damaged” categories and (2) deep learning-based methods (e.g. visual geometry group or VGG-16). CNN models have been employed for labeling segments in three classes: damaged, less-damaged and non-damaged. Furthermore, Zheng et al. (Citation2021) designed an object-based semantic segmentation method for building damage assessment. The deep object localization network replaces the super-pixel segmentation commonly used in the conventional object-based image analysis procedure for generating accurate building objects, an approach that offers seamless integration of object-based image analysis (OBIA) and DL. Additionally, a unified semantic change detection network is constructed using a deep object localization network as well as a deep damage classification network. Ji et al. (Citation2019) compare the efficiency of the texture feature originating from a grey-level co-occurrence matrix and deep features by utilizing a random forest classifier and pre/post-earthquake VHR dataset. The results of this damage assessment show that the deep features outperformed texture features in identifying collapsed buildings. Furthermore, the combination of CNN with the random forest classifier has greater accuracy in damage mapping than CNN alone. However, the above-mentioned methods have provided some acceptable results in damage mapping, accurate damage detection remains a big challenge in damage detection by only post-event datasets. This issue can be originated from the complexity of urban areas and its effect on damage assessment. The urban area is the more complex regions with different types of buildings in different shapes and sizes. This issue caused by damage detection is considered a challenging research topic area.

Results of damage mapping based on previous studies indicate that RS VHR imagery has a high potential for damage mapping. The result of damage mapping depends on the quality of features and the classifier algorithm used. Mainly, the above-mentioned methods in both groups utilized standard convolution layers with additional pre-processing such as OBIA. The efficiency of advanced deep feature extraction layers (i.e. involution) and attention mechanism has been ignored by recent studies in building damage assessment. To minimize the mentioned challenges and enhance the performance of damage detection, it is crucial to develop an advanced procedure for the identification of damaged buildings.

DL-based algorithms have recently demonstrated very promising results in many RS applications, such as crop mapping (Natteshan & Suresh Kumar, Citation2020), classification (Wang & Miao, Citation2022; Xu et al., Citation2021), algae monitoring (Huynh et al., Citation2022). To this end, this research focuses on a novel damage detection framework based on DL. The proposed method uses the post-earthquake VHR dataset and building vector maps to assess building damages. Furthermore, the presented DL framework is based on a combination of multiscale convolution layers and a new version of the convolution layer called the involution kernel. Furthermore, to enhance the robustness of the network attention mechanism was combined.

As a matter of scientific fact, black-box artificial intelligence algorithms are still barriers, since they lack interpretability and explainability (Petch et al., Citation2021). In contrast to simpler and self-explanatory models, DL-based methods lack interpretability due to their complexity and nonlinear nature. An explainable artificial intelligence (XAI) model is an artificial intelligence model that generates an output that humans can understand, the opposite of a “black box” model (Langer et al., Citation2021; Rojat et al., Citation2021). That is, XAI models provide an understanding of what is causing the model output. By using XAI, we can assess the rationality of the training dataset for damage mapping, which is vital in supervised learning methods.

In this study, the following main contributions are made: (1) present a novel DL-based framework for building damage assessment; (2) we apply the XAI approach for a building damage mapping model that describes how/which features contribute to model output in the damage mapping; (3) we employ an advanced feature representation method based on involution kernel and multi-scale convolution; and (4) we compare and assess the efficiency of our framework with that of other DL-based methods in two different real-world case study areas.

Methodology

The general overview of the building damage detection framework is shown in . The building damage detection process is applied in five main steps: (1) pre-processing which is applied to prepare input data for the next analysis, (2) model training and optimization of model parameters, (3) after training, the predictive model is utilized to generate the final building damage map. Then, building footprints are extracted by overlaying buildings with no labels with the vector maps. Next, those building footprints are fed into the predictive model to obtain a label. Finally, the predicted label is assigned to all pixels within the footprint in the raster map. A visual representation of this process can be seen in , (4) accuracy assessment that is evaluated on the results of damage mapping based on comparison with reference map, and (5) visual explanations from the proposed method that can help to understand what is causing the model output.

Figure 1. Flowchart of the proposed framework for building damage mapping.

Figure 1. Flowchart of the proposed framework for building damage mapping.

Figure 2. The labeling process of the proposed damage mapping framework.

Figure 2. The labeling process of the proposed damage mapping framework.

Pre-processing

Pre-processing is the first step of our damage mapping framework that is applied to prepare input datasets for consideration by DL. These are the main pre-processing steps. (1) Registration of the VHR dataset with a vector map: the registration is checked based on building polygons overlaid on the image dataset. (2) Building footprint extraction: building footprints are extracted based on overlying building polygons (vector maps) with VHR datasets. Thus, the building footprint is used as the input feature dataset to the proposed framework. It is worth noting that all buildings have different sizes, while the input size of the network is constant. Therefore, the small buildings are up-sampled by interpolation and large buildings are down-sampled by aggregation. (3) Data augmentation is used to increase the size of sample data by some operations. This research employs the data augmentation technique for the Bata explosion dataset, a method that uses operations such as rotation (90 degree), random flip (left and right), random flip (up and down), and transpose. The criteria to classify buildings is based on whether they are non-damaged or damaged based on having debris in them. Accordingly, two classes are defined as follows:

Non-Damaged Building: A building with an intact roof is classified as a non-damaged building.

Damaged Building: A building whose roof has been destroyed by an earthquake is considered a damaged building.

Proposed network training

The proposed method has several parameters that require tuning. First, the training network is applied based on the back-propagation method to both the training and validation datasets. The model parameters are initialized by an initializer such as He-Normal (He et al., Citation2015). They are then optimized by learning, using training samples, and are evaluated using the validation dataset. The error of the network, calculated by the loss function, is then fed to an optimizer such as adaptive moment estimation (Adam) (Kingma & Jimmy, Citation2017) to adjust the network error. This flow proceeds until stop conditions (i.e. number of iterations) is reached. After training, the optimum model is employed for the next analysis. We have employed the binary cross-entropy (BCE) as the loss function which is defined in EquationEquation (1).

(1) BCE=v logp+1vlog1p,(1)

where v is the real label and p is related to the output of the method.

Proposed DL architecture

presents the general overview of our DL network for building damage mapping. As can be seen from this structure, the proposed framework is composed of two fundamental components: (1) deep feature extraction and (2) classification. The deep feature extraction component includes convolution and involution layers that extract high-level meaningful deep features. In addition, the squeeze-and-excitation (SE) block is employed in the main structure of the proposed network to enhance the robustness of the network in the generation of deep features. Then, a flattening layer is incorporated to reshape the deep feature into a one-dimensional vector feature map. The main task of this flattening step is to create a connection between the deep feature extraction and classification steps. The classification includes two fully connected layers and an output dense layer with a soft-max activation function that is employed for making a decision.

Figure 3. Proposed involution/convolutional neural network architecture for building damage mapping.

Figure 3. Proposed involution/convolutional neural network architecture for building damage mapping.

The proposed network has some novelty and differences from other similar networks for building damage detection:

  1. Utilizing a multi-scale kernel instead of only a single kernel convolution enhances the robustness of the network against size variation.

  2. Employing an involution kernel to obtain rich feature representations.

  3. Utilizing an SE block to obtain the extraction of informative deep features.

The SE attention block

The SE block enhances channel interdependencies with minimal additional computational cost. The three main operations in SE are (1) squeezing, which reduces the spatial dimensions of the feature map to a singular value through global average pooling; (2) excitation, which learns the adaptive scaling weights for a feature by dense layers with different activation functions; and (3) rescaling, to get the original size via element-wise multiplication. illustrates the overview of the SE attention block.

Figure 4. 2D SE attention mechanism schema.

Figure 4. 2D SE attention mechanism schema.

Multi-scale convolution layers

Convolution layers are the basic operators in DL-based procedures. These layers automatically provide high-level, meaningful deep features from the given image (Jagannathan & Divya, Citation2021). The feature value (V) of the lth layer with an input data x and a non-linear activation G is obtained as follows in EquationEquation (2) (Seydi et al., Citation2021):

(2) Vl=Gwlxl1+bl,(2)

where b and w are the bias and weighted vectors in the lth layer, respectively. The output of a convolution layer with kernel size (M×N) at position (x,y) is calculated as follows in EquationEquation (3) (Yu et al., Citation2020):

(3) Vi,jxy=Gbi,j+n=0Ni1m=0Mi1Wi,j,χn,mVi1x+ny+m.(3)

The multiscale kernel convolution layer uses different kernel sizes (i.e. 3×3, 5×5, and 7×7) instead of only a constant kernel size. It improves the robustness of the network against size variations by adopting multiscale blocks.

Innovation kernelkernel

Convolution kernels are space-agnostic and channel-specific, which makes them incapable of adapting to diverse patterns of visual representation according to location (Li et al., Citation2021). Convolutional layers can capture richer spatial context information and long-range spatial interactions with larger kernels, and receptive fields are greater with larger kernels. In contrast, the number of parameters from convolution kernels increases quadratically as the kernel size increases. Furthermore, convolution also presents problems with capturing long-range spatial interactions due to its receptive field. To address this limitation, an involution kernel was introduced that is location-specific and channel-agnostic (Meng et al., Citation2021). Due to fewer parameters and less computation, the involution layer is more efficient than traditional convolution layers. Thus, the combination of involution layers and basic layers can help to increase the performance of networks with low parameters. The general structure of the involution layer is shown in .

Figure 5. The involution schema. and refer to the summation and multiplication operations, respectively.

Figure 5. The involution schema. ⊕ and ⊗ refer to the summation and multiplication operations, respectively.

Let HRH×W×C×ρ×ρ×Φ indicate the involution kernel with Φ groups. An involution kernel in position x,y is as Hx,y,.,.,φRρ×ρ,φ=1,2,,Φ. The output feature map of the involution kennelF at the coordinatex,y for the input feature map (X) is defined in EquationEquation (4) (Li et al., Citation2021).

(4) Fx,y,θ=u,vΔρHx,y,u+ρ2,v+ρ2,pΦC Xx+u,y+v,θ,(4)

where Hx,y is generated solely conditioned on the spectral feature vector Xx,yRC for efficiency, as defined in EquationEquation (5) (Li et al., Citation2021). Furthermore, considering the center pixel as the center, ΔρZ2 represents the neighborhood offsets.

(5) Hx,y,.,.,ψ=ϕXx,y=W1δBNW0,(5)

where ϕ: RCRK×K×G denotes the kernel generation function; W0RCr×C and W1RK×K×G×Cr are the parameters of the first and second linear transformations, respectively. Furthermore, BN denotes batch normalization and r is the reduction ratio.

XAI Grad-CAM interpretation

Gradient-weighted class activation mapping (Grad-CAM) is a well-known class activation mapping-based method that employs backpropagation to score the feature maps’ position in a layer (Sattarzadeh et al., Citation2021). Recently, this method has outperformed other XAI methods in remote sensing applications (i.e. classification) (Kakogeorgiou & Karantzalos, Citation2021; Stomberg et al., Citation2022). Unlike other XAI methods (i.e. CAM), the Grad-CAM method does not need global average pooling; as a result, Grad-CAM has led to widespread use in the visualization of key features (Kakogeorgiou & Karantzalos, Citation2021). This method uses the output of the features in the last convolution layer for the saliency map (Sattarzadeh et al., Citation2021). The saliency map can be calculated out of the feature map (Ψ Rw×h×N) in the last convolution layer for class c, as shown in EquationEquations (6) and (Equation7).

(6) St=i=1NReluaic.Ψi(6)
(7) aic=1w×hu=1wv=1hGu,vc,(7)

where G is the corresponding gradient for the feature map Ψi.

Accuracy assessment

Accuracy assessment is performed through two procedures, (1) visual analysis and (2) numerical analysis by measurement indices. The numerical analysis is based on the comparison of the results of the proposed method in test areas with the testing dataset. This research uses the confusion matrix and six widely adopted indices to assess the thematic quality that originated from the confusion matrix. These indices include overall accuracy, the Kappa coefficient, omission error, commission error, recall, and precision.

Furthermore, to evaluate the performance of our framework, three of the most common DL-based methods are implemented: (1) CNN (Kalantar et al., Citation2020) that includes two convolution layers, one max-pooling and two dense layers, (2) residual CNN (Res-CNN) was built by one stem block for shallow deep feature generation, three residual blocks and one dense layer, (3) vision transformer (ViT) (Dosovitskiy et al., Citation2020) is built by several multi-head attention layers and two fully connected layers, (4) channel-expanded CNN (CECNN) (Qing et al., Citation2022) which has five convolutional layers, two max-pooling layers and three fully connected layers, and VGG-19 (Ünlü & Kiriş, Citation2022) which has been built by 16 convolutional layers organized into five blocks, interspersed with max-pooling layers. These methods are applied in the same condition as the proposed method with the same hyperparameters.

Datasets and study areas

Dataset and case study #1: bata explosion

During the afternoon of 7 March 2021, a series of explosions occurred at the Nkuantoma armory and military barracks in Bata, Equatorial Guinea’s economic center. As a result of these explosions, more than 100 people were killed and more than 600 were injured. shows the very high-resolution dataset. presents the location of the first case study area (Bata explosion). This dataset was captured by Worldview-III on 9 March 2021, with four spectral channels and a spatial resolution close to 50 (cm).

Figure 6. The dataset used in the first study area: (a) post-event high-resolution data, (b), and (c) the geographical location of the study area.

Figure 6. The dataset used in the first study area: (a) post-event high-resolution data, (b), and (c) the geographical location of the study area.

illustrates the building vector map for the study areas. This vector map is generated based on a pre-event dataset having been manually digitized by a local expert. This vector map includes 706 building footprints, among which 338 polygons belong to non-damaged buildings (green polygons) and 368 polygons are related to the damaged buildings class (red polygons). Additionally, the Bata-Explosion dataset’s ground truth is available on an open public website (https://www.unitar.org).

Figure 7. The sample building footprints for the first study area in both classes.

Figure 7. The sample building footprints for the first study area in both classes.

presents the details of the incorporated sample dataset used for this case study. This sample is divided into three group datasets: training data, validation data, and testing data.

Table 1. Characteristics of the sample dataset for Bata explosions.

Dataset and case study #2: Haiti earthquake

In the western portion of the Republic of Haiti, approximately 25 km south of the capital city of Port-au-Prince , a magnitude of 7.0 earthquake struck at 4:53 pm, local time on 12 January 2010. The Haitian government reported that over 316,000 people died or went missing, 300,000 were injured, and 1.3 million were left homeless by the earthquake (DesRoches et al., Citation2011). Worldview-II satellite imagery acquired on 15 January 2010, was used in this study to evaluate the proposed method . This dataset contains a 50 (cm) spatial resolution and three spectral channels (red, green, blue). The testing area includes buildings of different sizes and roof shapes.

Figure 8. Dataset used for the Haiti earthquake: (a) post-earthquake VHR image, (b), and (c) the geographical location of the study area.

Figure 8. Dataset used for the Haiti earthquake: (a) post-earthquake VHR image, (b), and (c) the geographical location of the study area.

The classification results are strongly influenced by the quality and quantity of the sample dataset. One of the critically important evaluations of classifier methods is the assessment of their generalization capability. To evaluate generalization, this research uses two different regions for training and evaluating the network. illustrates the distribution of the sample data (red and green) for two classes of damaged and intact polygons. In addition, the yellow polygons are incorporated to assess the damage detection algorithms. A ground truth dataset for Haiti-Earthquake can be found on the website (https://dataverse.harvard.edu).

Figure 9. Spatial distribution of the sample dataset for the Haiti earthquake.

Figure 9. Spatial distribution of the sample dataset for the Haiti earthquake.

The details of the used sample data are presented in . As with the first case study, the sample dataset is divided into three groups: training data, validation, data, and testing data.

Table 2. Characteristics of the sample dataset for building damage mapping for the Haiti earthquake.

Experiment and results

DL-based algorithms have several parameters that need to be set. The values of these parameters are set as follows: the mini-batch size is 550, the dropout rate is 0.2, the number of epochs is 500, the learning rate is set at 10−3, and the number of neurons in the first and second fully connected layers is 550 and 250, respectively. Finally, due to the structure of buildings in both study areas, we set the final input patch-size of the model for Bata explosion and Haiti earthquake are 25 × 25 and 50 × 50, respectively.

Results from damage mapping of bata explosion

The results of damage detection methods for the Bata explosion are shown in . It can be seen that the more damaged buildings are located in the center and the non-damaged buildings are located around the study area. Generally, the results of damage detection show that all methods have provided acceptable performance, but there is a difference in the more detailed mapping. For example, some non-damaged buildings are detected as damaged buildings by algorithms. The proposed method provides excellent performance in damage mapping of both classes.

Figure 10. Visual comparison of the results from building damage mapping for the Bata explosion: (a) CNN, (b) Res-CNN, (c) CECNN, (d) VGG-19, (e) ViT, (f) the proposed method, and (g) ground-truth.

Figure 10. Visual comparison of the results from building damage mapping for the Bata explosion: (a) CNN, (b) Res-CNN, (c) CECNN, (d) VGG-19, (e) ViT, (f) the proposed method, and (g) ground-truth.

shows the enlarged sample buildings with the results derived from different classification methods. As can be seen, the proposed method detects the most damaged building polygons, as well as the most undamaged building polygons as well, while other methods miss-classify buildings.

Figure 11. Enlarged sample buildings after the Bata explosion green polygons denote damaged buildings; red polygons denote undamaged buildings.

Figure 11. Enlarged sample buildings after the Bata explosion green polygons denote damaged buildings; red polygons denote undamaged buildings.

The accuracy assessment for the building damage map is presented in . As observed, the CNN, Res-CNN, and ViT algorithms provide an overall accuracy of under 80% and a Kappa coefficient close to 0.5. The CECNN and VGG-19 damage detection models have led to an overall accuracy of 81.6% and 81.4%, respectively. The proposed method provides considerable improvement in building damage mapping, as the overall accuracy and Kappa coefficients are over 84% and 0.68, respectively. All these methods present considerable efficiency in the detection of damaged buildings, more so than for non-damaged buildings. Furthermore, the CNN, Res-CNN, CECNN, VGG-19, and ViT algorithms provide recall and precision under 82%, while the proposed method provides more than 83%.

Table 3. Comparison of accuracy of different algorithms for building damage mapping for the first study area.

The confusion matrices from the building damage detection results are presented in . These results indicate that the compared methods detected fewer than 160 out of 203 non-damaged building polygons, while the proposed method detected more than 170 out of 203 non-damaged building polygons. Similarly, the proposed method detected 185 out of the 221 buildings in the damaged class, while other methods detected fewer than 174 damaged polygons. Elements of the secondary diagonal show the detection error in the confusion matrix. The proposed method is provided under 36 polygons, while other methods have provided more than 43 building polygons. We also observed that the VGG-19, CECNN, and the proposed method have closer results in the secondary diagonal than other combinations of compared methods.

Figure 12. Comparison of the confusion matrices of different algorithms for building damage mapping: (a) CNN, (b) Res-CNN, (c) CECNN, (d) VGG-19, (e) ViT, and (f) proposed method.

Figure 12. Comparison of the confusion matrices of different algorithms for building damage mapping: (a) CNN, (b) Res-CNN, (c) CECNN, (d) VGG-19, (e) ViT, and (f) proposed method.

Results from damage mapping from the Haiti earthquake

Results of building damage mapping by DL-based methods for the testing area of the Haiti earthquake are shown in . As can be seen, most of the methods provided similar results in building damage mapping, although their details differ.

Figure 13. Visual comparison of the results of building damage mapping for the Haiti earthquake: (a) CNN, (b) Res-CNN, (c) CECNN, (d) VGG-19, (e) ViT, (f) the proposed method, and (g) ground-truth.

Figure 13. Visual comparison of the results of building damage mapping for the Haiti earthquake: (a) CNN, (b) Res-CNN, (c) CECNN, (d) VGG-19, (e) ViT, (f) the proposed method, and (g) ground-truth.

For clarity, we selected some random building polygons from the results of building damage mapping presented in . As can be seen, this figure presents the results of enlarged building polygons for six building polygons. We find that the proposed method has good performance in mapping both damaged and non-damaged building polygons.

Figure 14. Enlarged sample buildings in study areas for the Haiti earthquake.

Figure 14. Enlarged sample buildings in study areas for the Haiti earthquake.

The numerical results of building damage mapping for the Haiti earthquake are presented in . Based on the results obtained by quality measurement indices, all the methods yielded greater accuracy than the Bata explosion dataset in the testing area. Accuracy by DL-based methods ranged from 81% to 90% by overall accuracy index. Among the DL-based methods used, the proposed method provides the highest accuracy, as its overall accuracy is 90.76%. The proposed method improves damage mapping by more than 9, 5, 3, 2, and 7% points in the overall accuracy index for the CNN, Res-CNN, CECNN, VGG-19, and ViT algorithms, respectively. Furthermore, a significant improvement can be seen in the Kappa coefficient. As indicated by the results, the improvement in the Kappa coefficient index of the proposed method is more than 0.5 compared with other methods. The CECNN and VGG-19 models have led to better performances than the proposed method for non-damaged and damaged classes, respectively. These models have missed their performance in the detection of the damaged and non-damaged classes, respectively.

Table 4. Comparison of the accuracy of different classification algorithms for damage mapping for the Haiti earthquake.

The confusion matrix of the building damage detection methods is presented in . As can be seen, more methods provide good performance in the detection of non-damaged building polygons: among 943 non-damaged building polygons, more than 758 building polygons are truly detected by more methods. However, the proposed method detects 895 building polygons from 943 non-damaged building polygons. Furthermore, the efficiency of the proposed method in detecting damaged building polygons is considerable, as the proposed method accurately detects 412 out of 497 building polygons. However, the VGG-19 approach has detected 914 polygons correctly, which is better than the proposed method, whose results are downgraded in detecting damaged buildings. It is worth noting that the CECNN method provides the same performance in mapping damaged buildings, although it is less effective in mapping non-damaged buildings.

Figure 15. Comparison of the confusion matrices of different classification algorithms for damage mapping of the Haiti earthquake: (a) CNN, (b) Res-CNN, (c) CECNN, (d) VGG-19, (e) ViT, and (f) the proposed method.

Figure 15. Comparison of the confusion matrices of different classification algorithms for damage mapping of the Haiti earthquake: (a) CNN, (b) Res-CNN, (c) CECNN, (d) VGG-19, (e) ViT, and (f) the proposed method.

XAI results in building damage mapping

This research used the Grad-CAM XAI model to visualize the critical portions of the input data for the proposed method. shows the result of the Grad-CAM XAI model for some building polygons in the latest convolution layer of the proposed method for the Bata explosion. Based on this figure we can see that for non-damaged buildings, the model focuses on the whole areas of the buildings. Thus, the model for non-damaged buildings considers all parts of a building. For damaged buildings, the model tries to focus on collapsed areas, where high texture is considered for building damage classification. Thus, the model tries to learn how to classify a building into two classes based on its texture characteristics. Furthermore, the superior performance of the model is logical and not unexpected.

Table 5. Visualization results of Grad-CAM for the Bata explosion.

Similarly, the result of employing the Grad-CAM algorithm for the Haiti earthquake is illustrated in . As seen, the model provides selected results from the Bata explosion dataset. This subject originates from the features of the building for this study area. For non-damaged buildings, the model tried to focus on different areas of buildings; these areas have smooth textures. Moreover, since the damaged building model concentrates on non-smooth texture areas (debris), the model learns the properties of damaged and non-damaged areas as well.

Table 6. Visualization results of Grad-CAM for the Haiti earthquake.

Ablation analysis

This analysis aims to determine how the removal of a portion of the model affects the overall model performance. Using three different scenarios (S1) without an SE module, (S2) without an involution layer, and (S3) without a fully connected layer, we investigated the impacts of ablation analysis on the proposed framework. represents the result of the ablation analysis on building damage mapping for the Bata explosion. As can be seen, the fully connected layers have the lowest impact on the structure of the proposed framework. Furthermore, the involution layer plays a key role in the proposed method since it reduces the performance of the model by more than 3% in terms of OA.

Table 7. Ablation analysis of the proposed method for the Bata explosion.

provides the result of ablation analysis for the Haiti earthquake. Similarly, all components play a key role in the performance of the proposed model. Based on these results, the SE module has the highest impact on the effectiveness of the model (S1). Further, the fully connected layers have the least impact on the structure of the proposed framework (S3).

Table 8. Ablation analysis of the proposed method for the Haiti earthquake.

is an illustration of the Haiti earthquake feature maps in the SE module and the involution layers. As seen, the involution layers focused on key points in the first and second layers. The SE module considers the specific region in the earlier layers. Furthermore, the visualization of feature maps for the third layer shows that either the involution layer or the SE module focused on whole of the building surface for making a decision.

Table 9. An illustration of the feature maps of the SE module and involution layer for the Haiti earthquake.

Discussion

This section investigates and delineates the challenges and issues related to the building damage mapping process and summarizes the performance of the proposed method in different scenarios. This research evaluates the performance of building damage detection in two different real-world study areas. Moreover, the results of building damage detection are compared with other state-of-the-art methods. Based on the results presented in through 15 and through 4, the proposed method outperforms other state-of-the-art methods. The efficiency is proved in both datasets and presented as numerical results in .

Most of the building damage detection-based methods use a bi-temporal pre/post-event dataset. However, change detection-based building damage detection methods could provide some promising results. For instance, Gupta and Shah (Citation2020) proposed a deep learning-based framework for a bi-temporal dataset that has provided an overall score of 0.77. Furthermore, Merlin and Wiselin Jiji (Citation2019) provided an accuracy of 88% in damage detection by the DL-based method. Thus, the proposed method achieved an accuracy equal to other bi-temporal-based damage detection methods. It is worth noting that this study only uses a post-event dataset for damage mapping: pre-paring and extracting the change information from a bi-temporal dataset is more time-consuming and challenging. Furthermore, finding a post-event dataset requires more consideration, presenting a substantial challenge to building damage mapping. Furthermore, the proposed framework uses the building vector map for extracting the footprints of buildings that are available by open street map (OSM) website and many organizations. In other words, the process lead to the proposed method as it does not require additional datasets (pre-event) to be processed. In addition, the model focuses only on post-event dataset that can help to reduce the complexity of model. This benefit of the proposed method can help apply damage mapping in real-world applications.

Unlike the semantic segmentation DL-based methods, which demand a large sample dataset, the proposed method applies to a small sample dataset. A pre-trained model can also solve the challenge of obtaining sample data, but the negative transfer does not apply to all study areas. In fact, negative transfer learning refers to the similarity of the input dataset for the model and target datasets, which must be sufficient. To this end, Ji et al. (Citation2020) utilized a pre-trained CNN model for building damage detection that provided an accuracy of 88% by overall accuracy index. Thus, the utilization of a pre-trained model is not effective for all solutions. Due to the structure of building areas in cities, this similarity is not considered, which affects building damage detection results.

Based on the presented results, increasing the size of sample datasets can improve the performance of DL-based methods. For example, the Bata explosion test case uses almost pieces of 210 data samples and the proposed method provides an accuracy of 84% by the OA index. In contrast, the proposed method for the Haiti earthquake provides an overall accuracy of more than 90% using 500 sample datasets to train the model. Thus, the training model with a suitable sample dataset can obtain considerable improvement in damage detection results.

The main difference between the buildings in the study areas is orientation, size, and color. The model can focus on the texture of buildings to classify them into non-damaged and damaged classes. The results of XAI show the model focuses on the texture of buildings for damage mapping. It is worth noting that buildings are a highly complex component in urban areas. Thus, their features might mislead the downstream model.

Generalization is an important criterion for DL-based methods in building damage detection. To evaluate the generalization of the DL-based method, we separated the training dataset from testing areas for the Haiti earthquake dataset. The result of building damage detection for this dataset shows that the model has high generalization for the unseen samples.

provides a comprehensive comparison of the parameter counts of various deep learning models, which serves as an indicator of their respective computational costs. Notably, models such as VGG-19 and ViT, which are characterized by larger parameter counts, typically require more computational resources for both the training and deployment phases. Conversely, our proposed model, characterized by a leaner parameter count, has the potential to provide improved efficiency while maintaining performance. It is important to emphasize that the CNN model has the lowest computational cost among the models compared. However, it is important to recognize that this efficiency may come at the expense of optimal performance.

Table 10. A comparison of the computational cost of deep learning models.

Conclusion

This study proposed a novel deep learning-based framework for rapid and accurate building damage detection in two different areas and event types (earthquake and explosion). The informative feature extraction is the most important task in the supervised learning methods that common damage detection methods ignored the capacity of advanced deep feature extraction methods. To this end, we proposed a DL framework based on combining involution and convolution layers that increase the capacity of the model in robust feature extraction. Furthermore, the proposed method took advantage of a multi-scale block and attention mechanism to improve the building damage detection results.

The results showed that the proposed method had high efficiency in the mapping of building damages for both datasets. Our method achieved a high accuracy with only post-event datasets, while other building damage detection methods focused on bi-temporal datasets. Besides, we utilized building polygon vectors to model focused on deciding on building polygons (damaged or non-damaged). This theme helps to reduce computational cost and model complexity.

We employed the Grad-CAM XAI model to visualize critical features of the input data in the building damage mapping. The latest convolution layer of the proposed method was employed for visualization. Results from the Grad-CAM XAI model demonstrated that the texture of the building had a key role in the classification results. The proposed model tended to classify a building with a smooth texture as non-damaged, while a building with a harsh texture (debris areas) was classified as damaged. Thus, it is worth noting that sample data include the type of debris areas in a model that can generate reliable results on damaged polygons. Grad-CAM XAI is therefore used to analyze data in more detail to make generic sample data that can be used in the future to improve the efficiency of the model.

Generally, the proposed method had several advantages: (1) extraction of robust deep features that can detect damages with less error and high accuracy; (2) it has higher generalization than other DL-based methods; and (3) it uses only a post-event dataset for building damage mapping instead of bi-temporal dataset.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

Publicly available datasets were analyzed in this study. The datasets are available here: [https://www.maxar.com/open-data]

References

  • Binici, B., Yakut, A., Canbay, E., Akpinar, U., & Tuncay, K. (2022). Identifying buildings with high collapse risk based on Samos earthquake damage inventory in İzmir. Bulletin of Earthquake Engineering, 20(14), 1–21. https://doi.org/10.1007/s10518-021-01289-5
  • Boloorani, A. D., Darvishi, M., Weng, Q., & Liu, X. (2021). Post-war urban damage mapping using InSAR: The case of Mosul city in Iraq. ISPRS International Journal of Geo-Information, 10(3), 140. https://doi.org/10.3390/ijgi10030140
  • Cotrufo, S., Sandu, C., Giulio Tonolo, F., & Boccardo, P. (2018). Building damage assessment scale tailored to remote sensing vertical imagery. European Journal of Remote Sensing, 51(1), 991–1005. https://doi.org/10.1080/22797254.2018.1527662
  • Cui, S., Yin, Y., Wang, D., Zhiwu, L., & Wang, Y. (2021). A stacking-based ensemble learning method for earthquake casualty prediction. Applied Soft Computing, 101, 107038. https://doi.org/10.1016/j.asoc.2020.107038
  • D’Addabbo, A., Pasquariello, G., & Amodio, A. (2022). Urban Change Detection from VHR Images via Deep-Features Exploitation. In: X. Yang, S. Sherratt, N. Dey, & A. Joshi (Eds.) Proceedings of Sixth International Congress on Information and Communication Technology. Lecture Notes in Networks and Systems, vol 236. Springer, Singapore. https://doi.org/10.1007/978-981-16-2380-6_43
  • Davies, T. R., Korup, O., & Clague, J. J. (2021). Geomorphology and natural hazards: Understanding landscape change for disaster mitigation. John Wiley & Sons.
  • DesRoches, R., Comerio, M., Eberhard, M., Mooney, W., & Rix, G. J. (2011). Overview of the 2010 Haiti earthquake. Earthquake Spectra, 27(1_suppl1), 1–21. https://doi.org/10.1193/1.3630129
  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. https://openreview.net/forum?id=YicbFdNTTy
  • ElGharbawi, T., & Zarzoura, F. (2021). Damage detection using SAR coherence statistical analysis, application to Beirut, Lebanon. ISPRS Journal of Photogrammetry and Remote Sensing, 173, 1–9. https://doi.org/10.1016/j.isprsjprs.2021.01.001
  • Gupta, R., & Shah, M. 2020. Rescuenet: Joint building segmentation and damage assessment from satellite imagery [Paper presented]. 2020 25th International Conference on Pattern Recognition (ICPR). https://arxiv.org/abs/2004.07312v1
  • He, K., Zhang, X., Ren, S., & Sun, J. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification [Paper presented]. Proceedings of the IEEE international conference on computer vision. https://arxiv.org/abs/1502.01852v1
  • Huynh, N. H., Böer, G., & Schramm, H. (2022). Self-attention and generative adversarial networks for algae monitoring. European Journal of Remote Sensing, 55(1), 10–22. https://doi.org/10.1080/22797254.2021.2010605
  • Jagannathan, J., & Divya, C. (2021). Deep learning for the prediction and classification of land use and land cover changes using deep convolutional neural network. Ecological Informatics, 65, 101412. https://doi.org/10.1016/j.ecoinf.2021.101412
  • Ji, M., Liu, L., Runlin, D., & Buchroithner, M. F. (2019). A comparative study of texture and convolutional neural network features for detecting collapsed buildings after earthquakes using pre-and post-event satellite imagery. Remote Sensing, 11(10), 1202. https://doi.org/10.3390/rs11101202
  • Ji, M., Liu, L., Zhang, R., & Buchroithner, M. F. (2020). Discrimination of earthquake-induced building destruction from space using a pretrained CNN model. Applied Sciences, 10(2), 602. https://doi.org/10.3390/app10020602
  • Jundullah, M. R., & Wahyu Wijayanto, A. (2022). Natural disaster identification and mapping of Tsunami and earthquake in Indonesia using satellite imagery analysis. Aceh.
  • Kakogeorgiou, I., & Karantzalos, K. (2021). Evaluating explainable artificial intelligence methods for multi-label deep learning classification tasks in remote sensing. International Journal of Applied Earth Observation and Geoinformation, 103, 102520. https://doi.org/10.1016/j.jag.2021.102520
  • Kalantar, B., Ueda, N., Al-Najjar, H. A., & Abdul Halin, A. (2020). Assessment of convolutional neural network architectures for earthquake-induced building damage detection based on pre-and post-event orthophoto images. Remote Sensing, 12(21), 3529. https://doi.org/10.3390/rs12213529
  • Kapucu, N., Özerdem, A., & Sadiq, A.-A. (2022). Managing emergencies and crises: Global perspectives. Jones & Bartlett Learning.
  • Kingma, D. P., & Jimmy, B. (2017). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. http://arxiv.org/abs/1412.6980
  • Langer, M., Oster, D., Speith, T., Hermanns, H., Kästner, L., Schmidt, E., Sesing, A., & Baum, K. (2021). What do we want from explainable artificial intelligence (XAI)?–A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artificial Intelligence, 296, 103473. https://doi.org/10.1016/j.artint.2021.103473
  • Li, Q., Gong, L., & Zhang, J. (2019). A correlation change detection method integrating PCA and multi-texture features of SAR image for building damage detection. European Journal of Remote Sensing, 52(1), 435–447. https://doi.org/10.1080/22797254.2019.1630322
  • Li, D., Jie, H., Wang, C., Xiangtai, L., She, Q., Zhu, L., Zhang, T., & Chen, Q. 2021. Involution: Inverting the inherence of convolution for visual recognition [Paper presented]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://arxiv.org/abs/2103.06255v2
  • Lin, Q., Tianyu, C., Wang, L., Kumar Mondal, S., Yin, H., & Wang, Y. (2022). Transfer learning for improving seismic building damage assessment. Remote Sensing, 14(1), 201. https://doi.org/10.3390/rs14010201
  • Meng, Z., Zhao, F., Liang, M., & Xie, W. (2021). Deep residual involution network for hyperspectral image classification. Remote Sensing, 13(16), 3055. https://doi.org/10.3390/rs13163055
  • Merlin, G. S., & Wiselin Jiji, G. (2019). Building damage detection of the 2004 Nagapattinam, India, Tsunami using the texture and spectral features from IKONOS images. Journal of the Indian Society of Remote Sensing, 47(1), 13–24. https://doi.org/10.1007/s12524-018-0858-z
  • Naito, S., Tomozawa, H., Mori, Y., Nagata, T., Monma, N., Nakamura, H., Fujiwara, H., & Shoji, G. (2020). Building-damage detection method based on machine learning utilizing aerial photographs of the Kumamoto earthquake. Earthquake Spectra, 36(3), 1166–1187. https://doi.org/10.1177/8755293019901309
  • Natteshan, N. V. S., & Suresh Kumar, N. (2020). Effective SAR image segmentation and classification of crop areas using MRG and CDNN techniques. European Journal of Remote Sensing, 53(sup1), 126–140. https://doi.org/10.1080/22797254.2020.1727777
  • Petch, J., Shuang, D., & Nelson, W. (2021). Opening the black box: The promise and limitations of explainable machine learning in cardiology. Canadian Journal of Cardiology, 38(2), 204–213. https://doi.org/10.1016/j.cjca.2021.09.004
  • Qing, Y., Ming, D., Wen, Q., Weng, Q., Xu, L., Chen, Y., Zhang, Y., & Zeng, B. (2022). Operational earthquake-induced building damage assessment using CNN-based direct remote sensing change detection on superpixel level. International Journal of Applied Earth Observation and Geoinformation, 112, 102899. https://doi.org/10.1016/j.jag.2022.102899
  • Rojat, T., Puget, R., Filliat, D., Del Ser, J., Gelin, R., & Díaz-Rodríguez, N. (2021). Explainable artificial intelligence (xai) on timeseries data: A survey. arXiv preprint arXiv:2104.00950. https://arxiv.org/abs/2104.00950v1
  • Rupnik, E., Nex, F., Toschi, I., & Remondino, F. (2018). Contextual classification using photometry and elevation data for damage detection after an earthquake event. European Journal of Remote Sensing, 51(1), 543–557. https://doi.org/10.1080/22797254.2018.1458584
  • Sattarzadeh, S., Sudhakar, M., Plataniotis, K. N., Jang, J., Jeong, Y., & Kim, H. 2021. Integrated Grad-CAM: Sensitivity-Aware visual Explanation of deep convolutional networks via Integrated Gradient-based Scoring [Paper presented]. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.
  • Seydi, S. T., & Hasanlou, M. (2021). A new structure for binary and multiple hyperspectral change detection based on spectral unmixing and convolutional neural network. Measurement, 186(1), 110137. https://doi.org/10.1016/j.measurement.2021.110137
  • Seydi, S. T., Hasanlou, M., & Chanussot, J. (2021). DSMNN-Net: A deep siamese morphological neural network model for burned area mapping using multispectral sentinel-2 and hyperspectral PRISMA images. Remote Sensing, 13(24), 5138. https://doi.org/10.3390/rs13245138
  • Stomberg, T. T., Stone, T., Leonhardt, J., & Roscher, R. (2022). Exploring wilderness using explainable machine learning in satellite imagery. arXiv preprint arXiv:2203.00379. https://arxiv.org/abs/2203.00379v3
  • Talreja, P., Durbha, S. S., Shinde, R. C., & Potnis, A. V. 2021. Real-time Embedded HPC based earthquake damage mapping using 3D LiDAR Point Clouds [Paper presented]. 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium.
  • Ünlü, R., & Kiriş, R. (2021). Detection of damaged buildings after an earthquake with convolutional neural networks in conjunction with image segmentation. The Visual Computer, 38(2), 1–10. https://doi.org/10.1007/s00371-020-02043-9
  • Ünlü, R., & Kiriş, R. (2022). Detection of damaged buildings after an earthquake with convolutional neural networks in conjunction with image segmentation. The Visual Computer, 38(2), 685–694. https://doi.org/10.1007/s00371-020-02043-9
  • Wang, H., & Miao, F. (2022). Building extraction from remote sensing images using deep residual U-Net. European Journal of Remote Sensing, 55(1), 71–85. https://doi.org/10.1080/22797254.2021.2018944
  • Wang, C., Zhang, Y., Xie, T., Guo, L., Chen, S., Junyong, L., & Shi, F. (2022). A detection method for collapsed buildings combining post-earthquake high-resolution optical and synthetic aperture radar images. Remote Sensing, 14(5), 1100. https://doi.org/10.3390/rs14051100
  • Weber, E., Papadopoulos, D. P., Lapedriza, A., Ofli, F., Imran, M., & Torralba, A. (2022). Incidents1M: A large-scale dataset of images with natural disasters, damage, and incidents. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–14. arXiv preprint arXiv:2201.04236. https://doi.org/10.1109/TPAMI.2022.3191996
  • Xu, X., Chen, Y., Zhang, J., Chen, Y., Anandhan, P., & Manickam, A. (2021). A novel approach for scene classification from remote sensing images using deep learning methods. European Journal of Remote Sensing, 54(sup2), 383–395. https://doi.org/10.1080/22797254.2020.1790995
  • Yu, C., Han, R., Song, M., Liu, C., & Chang, C.-I. (2020). A simplified 2D-3D CNN architecture for hyperspectral image classification based on spatial–spectral fusion. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 2485–2501. https://doi.org/10.1109/JSTARS.2020.2983224
  • Zheng, Z., Zhong, Y., Wang, J., Ailong, M., & Zhang, L. (2021). Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man-made disasters. Remote Sensing of Environment, 265, 112636. https://doi.org/10.1016/j.rse.2021.112636