589
Views
0
CrossRef citations to date
0
Altmetric
Research Article

CG-CFPANet: a multi-task network for built-up area extraction from SDGSAT-1 and Sentinel-2 remote sensing images

ORCID Icon, , ORCID Icon, , , , & show all
Article: 2310092 | Received 02 May 2023, Accepted 19 Jan 2024, Published online: 31 Jan 2024

ABSTRACT

Accurate extraction of built-up areas is helpful to urban development and map updating. Nighttime light (NTL) data can capture the lighting signal of ground objects. However, most built-up area extraction is conducted on public limited coarse spatial resolution NTL images. The Sustainable Development Science Satellite-1 (SDGSAT-1) provides 10 m spatial resolution panchromatic NTL images, making it possible to map detailed urban lighting structures. In urban extraction, the boundaries of urban areas are easily confused with background objects due to the similar spectral and textual features. To address this problem, we proposed a multi-task deep learning model, CG-CFPANet, to extract illuminated built-up areas by synthesizing SDGSAT-1 NTL data and optical remote sensing images. In CG-CFPANet, a convolutional feature pyramid attention (CFPA) module for better contextual recognition and a concatenation group (CG) module to merge the two remote sensing images are developed. Our proposed CG-CFPANet achieved 1.3% higher precision in built-up area extraction than ten other recently proposed network structures: UNet, UNet++, PSPNet, DeeplabV3, FCN, ExtremeC3Net, SegNet, BiseNet, Res2-UNet, and CBRNet. It shows higher applicability for large-scale built-up area extraction.

This article is part of the following collections:
Innovative approaches and applications on SDGs using SDGSAT-1

1. Introduction

With the development of the global economy and rapid urbanization, the built-up areas are expanding rapidly worldwide. The world’s total built-up area in 2018 was 797,076 km2, which was 1.5 times larger than in 1990 (Gong et al. Citation2020). Urban land cover is expected to increase by 1.2 million km2 by 2030, almost three times the global urban land area in 2000 (Wang et al. Citation2020). The accurate extraction of built-up areas will assist with environmental protection measures, updating of maps, and urban development planning. It is therefore very important to develop efficient methods for accurate built-up area extraction.

Multitemporal information (Chen et al. Citation2023; Yu et al. Citation2022) and big data remotely sensed with satellites have shown great potential for mapping urbanization dynamics (Li, Gong, and Liang Citation2015; Zhu et al. Citation2019). Currently, medium- (Schneider, Friedl, and Potere Citation2009; Liu et al. Citation2018) and high-resolution (Chen et al. Citation2022; Yu et al. Citation2022; Lv et al. Citation2023; Lv et al. Citation2023; Lv et al. Citation2023) data have been widely used in urban area delineation research. Compared to traditional daytime remotely sensed products, nighttime satellite images can more effectively reflect human activities. Nighttime light images (NTL) can record light from ground objects and have proven to be an effective data source for urban area extraction, resulting in their wide use for the extraction of built-up area (Yu, Yang and Chen Citation2018). The NTL data provided by the Suomi-National Polar Orbiting Partnership (S-NPP) satellite (Choi and Cao Citation2019) and the National Oceanic and Atmospheric Administration (NOAA-20) satellite (Choi and Cao Citation2019) are extensively used to extract built-up area. These NTL images have a coarse spatial resolution of 750 m, and therefore only the approximate location of the built-up area can be captured. Luojia-1 was successfully launched in 2018 and provides NTL images with a spatial resolution of 130 m (Cui et al. Citation2023). Compared to the Visible Infrared Imaging Radiometer Suite (VIIRS) NTL data, which has a 500 m spatial resolution, the spatial resolution of Luojia-1 data is a significant improvement, but the Luojia-1 data has not been updated since 2020. In November 2021, the Sustainable Development Science Satellite-1 (SDGSAT-1) (Qi et al. Citation2022) was launched. It is the world’s first scientific satellite dedicated to serving the United Nations 2030 Agenda for Sustainable Development. The satellite is equipped with three different payloads: thermal infrared spectrometer (TIS), glimmer imager for urbanization (GIU), and multispectral imager for inshore areas (MII). It orbits the Earth at an altitude of 505 km and an inclination angle of 97.5°. The three payloads, combined with their 300 km-wide swaths, can provide all-time, all-weather, multi-load cooperative observations, achieving global coverage in 11 days. The GIU is used mainly for NTL detection in different classes of cities and villages, and for the scientific exploration of urban aerosols at night (Li et al. Citation2023), with 40 m multispectral, 10 m panchromatic, and 300 km wide global NTL image-acquisition capabilities (Guo et al. Citation2020). The GIU is used mainly for NTL detection and can provide panchromatic and RGB NTL images with a spatial resolution of 10 and 40 m, respectively (Zhang et al. Citation2022). Compared to other NTL data, GIU images have the advantages of a high resolution, multispectral characteristics, and large widths, and can provide a detailed built-up area outline as well as urban interior structure information (Yu et al. Citation2023a; Yu et al. Citation2023b).

Numerous methods have been proposed for built-up area extraction from NTL images. Song et al. (Citation2011) used threshold segmentation to extract urban built-up area from Defense Meteorological Program (DMSP) Operational Line-Scan System (OLS) NTL data. However, they found that significant errors occurred when certain thresholds were applied separately to NTL data from other years. To address this problem and improve the accuracy and robustness of built-up area extractions based on NTL datasets, Yu et al. (Citation2018) proposed a preprocessing method by applying a logarithmic transformation to the original S-NPP/VIIRS NTL composite data for urban built-up area extraction, which significantly improved the accuracy of the extracted built-up area. Sun et al. (Citation2020) adopted a binary segmentation method to extract built-up area, supplemented by an urban vegetation index, and used an extremum search algorithm to determine the boundaries. Li et al. (Citation2020) used a statistical data comparison method to extract built-up areas from Luojia-1 and NPP/VIIRS NTL images, which rely on land surface temperature data. However, these threshold methods are based on image color features to complete the pixel classification and often require additional urban metrics to improve accuracy, such as human settlement and urban vegetation indices (Liang et al. Citation2020), which vary by region. The incorporation of these factors can improve the robustness and probability of the method. In recent years, machine learning methods, such as a support vector machine (SVM) and clustering algorithms, have been widely used for classification in remote sensing images. Liu et al. (Citation2019) extracted the built-up area using an SVM by synthesizing VIIRS NTL and Landsat-8 data. Chen et al. (Citation2020) applied a pixel-based clustering algorithm to distinguish urban areas from the background. However, SVMs are sensitive to missing data, and the clustering algorithm (Zadeh, Fathian, and Gholamian Citation2014) is easily affected by noise and abnormal data, which limits the accuracy of these methods.

Although traditional threshold segmentation methods perform well in most simple cases, their performances may decline in complex scenes. Machine learning methods rely on feature engineering heavily (McGauley and Nolan Citation2011), which may result in poor classification results because of impropriate features. The model we propose in this study is based on the incredible feature learning ability of deep learning frameworks, which allows extracting illuminating built-up areas from remote sensing images directly.

Deep learning can avoid feature engineering by continuous convolution operations, wherein semantic segmentation is a widely used deep learning framework with its capability to assign a semantic label to each pixel of the image. The semantic segmentation framework has been widely used for built-up area extraction (Hu et al. Citation2023). Hu et al. extracted built-up areas from NTL images based on UNet. Abdollah et al. proposed a model that combines SegNET and UNet to extract built-up areas. Nevertheless, these methods face challenges when it comes to accurately capturing the intricate features of urban built-up areas in the presence of complex backgrounds (Abdollahi, Pradhan, and Alamri Citation2022). Tan et al. proposed a semantic segmentation model named LMB-CNN to extract built-up area (Tan, Xiong, and Yan Citation2020). To fully extract image features, the encoder of their proposed model has three branches and can extract multiple diverse features, but it is still difficult to extract accurate boundaries due to the similar spectral and textual features between urban area and background objects. To solve the existing shortcomings, we propose a concatenation group (CG) module and a convolutional feature pyramid attention (CFPA) module. The CG module can fully integrate the features of NTL image and optical remote sensing image to improve the details of the result map, and the CFPA module can enhance the context recognition ability of the model to improve the accuracy in complex environments. Furthermore, our deep learning model is specifically designed to synthesize NTL and optical remote sensing images, and even in complex scenes, our model can also achieve high accuracy. Our comparative experiments have shown that with the assistance of optical images, the accuracy is improved obviously.

2. Related work

Deep learning neural networks have a strong generalization ability and robustness (Karabayir, Akbilgic, and Tas Citation2021; Cai and Hu Citation2020), and can learn features automatically without the need to manually set threshold parameters. Semantic segmentation is a typical deep learning framework that is capable of pixel classification (Ma and Chang Citation2022). With the introduction of fully convolutional networks (FCNs) (Li et al. Citation2021) and UNet (Ronneberger, Fischer, and Brox Citation2015), semantic segmentation methods are rapidly developing. Zhou et al. (Citation2020) proposed UNet++ (), an improved structure of UNet, which concatenates more feature maps from the encoder than UNet. Zhao et al. suggested the Pyramid Scene Parsing Network (PSPNet) (Long, Zhang, and Zhao Citation2020), which is widely used due to the superiority of the Spatial Pyramid Pooling structure (Ke, Le, and Yao Citation2020). The feature map in the pyramid pooling module is divided into four different sizes by the pooling layer (Duan et al. Citation2022), allowing the module to extract contextual information (Li et al. Citation2022) around the target. Li et al. (Citation2018) proposed a feature pyramid attention module (FPA) (), which is an improved version of the Spatial Pooling pyramid.

Figure 1. The architecture of UNet++.

Figure 1. The architecture of UNet++.

Figure 2. The FPA module based on the pooling pyramid.

Figure 2. The FPA module based on the pooling pyramid.

Due to the advantages of the newly-developed deep learning algorithms for extracting the objects summarised above, deep learning methods have been used and improved in recent years to extract building objects from high- and very high resolution (VHR) remote sensing images (Yu et al. Citation2022; Yu et al. Citation2022). Xu et al. (Citation2018) used the Res-U-Net semantic segmentation model to combine the normalized differential vegetation index (NDVI), normalized digital surface model (NDSM), and the first component of the principal component analysis (PCA1) to realize high performance building extraction based on VHR data. The RES-U-Net is a U-shaped structure in which a ResNet module is integrated. ResNet can avoid the problem of model accuracy degradation caused by over-convolution; however, its structure lacks a multi-scale fusion module and cannot improve the model’s ability to recognize context. Zhao et al. (Citation2023) proposed the multiscale receptive field network (MSRF-Net), which is a model that uses convolutional layers of different sizes to enhance feature extraction, enabling buildings to be extracted from remote sensing images. However, the existing methods for built-up area target extraction are mainly applied to NTL data, further study is needed to investigate how to comprehensively apply optical remote sensing data to further improve the accuracy of built-up area extraction.

Since there is limited research exploring automatic methods for built-up area extraction based on the recently released SDGSAT-1 nighttime light images and optical images, we propose a new semantic segmentation framework that combines NTL and optical remote sensing images to extract urban built-up areas. It has higher accuracy in complex background interference with clearer urban area boundaries. To improve the accuracy of the multi-task framework model for extracting built-up area from NTL data, it is necessary to use high spatial resolution optical remote sensing imagery as an auxiliary feature. In this study, a new model, CG-CFPANet, was proposed to fully integrate multiple features from optical remote sensing and NTL images to achieve the accurate extraction of built-up area. The study was conducted by: (1) developing the convolutional feature pyramid attention (CFPA) module and the concatenation group (CG) module, which were synthesised in the CG-CFPANet model, to fully integrate the increased number of features, and then (2) applying the proposed model in the eastern part of the Yangtze River Delta to analyse its applicability and effectiveness.

3. Methodology

The model consists of four parts: an encoder based on classical VGG16 backbone (Theckedath and Sedamkar Citation2020; Qassim, Verma, and Feinzimer Citation2018), a bilinear upsampling based decoder (Lin et al. Citation2022; Wang et al. Citation2018), a newly proposed CG module, and a CPFA module. The CG and CFPA modules are proposed to connect encoder and decoder modules. The CG module is used to enhance the extracted feature maps, and the CFPA module is proposed to improve the context recognition capability.

3.1. The encoder module

The encoder module can extract features from both NTL images and optical remote sensing images through sequential convolutional and pooling layers (Shi, Xu, and Li Citation2018). Our proposed model has two inputs: NTL data and optical remote sensing images. The NTL image provides direct information regarding the location of the built-up area with an urban light signal. The optical remote sensing image can improve the accuracy of the NTL data through its detailed daytime ground object features. Both input images are 512 × 512 pixels in size with three channels before they are synthesized in the encoder. The two input images are synthesized into one feature map by concatenation operation, a widely used feature synthesis operation (Noreen et al. Citation2020). The synthesized feature map is encoded into 1,024 channels, with a size of 64 × 64 pixels, through seven successive convolution and pooling operations.

3.2. The decoder module

The decoder restores the feature map from 64 × 64 pixels to its original input image size of 512 × 512 pixels by upsampling six times through bilinear operation and reducing the number of channels from 1024 to 2. The two-channel feature map generalizes the final binary classification result image with the built-up area (indicated intensity of 255) and the background objects (indicated intensity of 0). To improve the feature extraction and fusion capabilities of our proposed model, a convolution layer is adopted after each upsampling operation (Lu et al. Citation2017).

3.3. The CG module

Our proposed model (CG-CFPANet) extracts built-up area from NTL images, supplemented by optical remote sensing images. The NTL images can reflect human activities obviously and directly. They can provide coupled information of NTL intensity and lighting location, with the illuminated area indicating built-up area. In optical remote sensing images, most built-up ground objects have regular features that are obviously different from the irregular shapes of ground objects in the natural environment. Additionally, most concrete built-up area has a significant color feature of grey. To enhance the fusion and extraction of features from the two different sources of images, we developed a CG module to fuse more feature maps by concatenation at different network branches to increase the fusion capability of the two features.

Our proposed CG module has six input feature maps and four output feature maps ( and ). These inputs are derived from each feature map in the encoder, which can make full use of features extracted from different scales. Each pair of the adjacent feature maps is fused by concatenation to form a new set of feature maps (layer 1 in ). The adjacent feature maps in layer 1 are further concatenated to generate output feature maps and used to generalize the output extraction result image in the decoder.

Figure 3. The structure of the proposed CG-CFPANet model.

Figure 3. The structure of the proposed CG-CFPANet model.

Figure 4. The architecture of the concat group.

Figure 4. The architecture of the concat group.

*Each feature map number F (*) corresponds to the number in .

3.4. The CFPA module

Contextual information can improve the extraction accuracy (Jawarneh et al. Citation2020) by identifying suspicious built-up area. For example, whether a suspicious built-up area is surrounded by streets or mountains will affect the model’s judgement of the built-up target. It is therefore necessary to extract the contextual information around the target to improve the accuracy of the built-up area extraction results. A CFPA module for contextual information extraction was therefore developed (). Learning from the FPA module (), the pooling layer was replaced by a convolutional layer in the pyramid pooling module to obtain a stronger feature extraction capacity. Inspired by PSPNET (Yu et al. Citation2018), our proposed CFPA module divided the feature map into four different sizes by convolutional layer. The convolutional layer has higher extraction capabilities than the pooling layer and is therefore able to extract more features than the pyramid pooling module in PSPNet.

Figure 5. The architecture of the CFPA module.

Figure 5. The architecture of the CFPA module.

The convolutional layer in the CFPA module can control the size of the target map by controlling the parameters. The size of the feature map can be controlled using the following two formulas: (1) W=WF+2P2S,(1) (2) H=HF+2P2S,(2) where W is the width of the image; H is the height of the image; F is the size of the convolution kernel; and S is the stride length.

*The number F (*) corresponds to the feature map number in .

4. Experiment and results

4.1. Study area and data samples

The study area was located in the eastern part of the Yangtze River Delta and included five cities: Shanghai, Ningbo, Zhoushan, Suzhou, and Jiaxing (). Most of the study area was a plain, with flat territory in the north and mountains in the south. The study area contained different types of built-up area and natural environments, which provided samples with different object backgrounds. The study area was used to evaluate the extraction of the proposed methodology in urban areas in a more comprehensive and objective way.

Figure 6. The location of the study area. The optical remote sensing images were provided by Sentinel-2 with a true color combination of red, green, and blue channels.

Figure 6. The location of the study area. The optical remote sensing images were provided by Sentinel-2 with a true color combination of red, green, and blue channels.

Figure 7. The model accuracy convergence results during the training process.

Figure 7. The model accuracy convergence results during the training process.

The data samples used in this study were optical remote sensing images, NTL datasets, and the GlobeLand30 land cover products (Huang et al. Citation2016). The optical remote sensing images were provided by the Sentinel-2 satellite and were taken in 2021. They had a spatial resolution of 10 m and were RGB four-band images. In the optical remote sensing images provided by the Sentinel-2 satellite, there is a clear distinction between built-up areas and the natural environment. Nighttime light images were obtained from SDGSAT-1. The SDGSAT-1 data was captured in December 2021 in the form of panchromatic images with a spatial resolution of 10 m. The GlobeLand30 product for 2020 was selected to provide an appropriate land cover data reference for the application of the built-up area extraction within the urban area using the method established in Section 3. GlobeLand30 (2020) is an open-access 30 m resolution global land cover data product that was developed by the National Geomatics Center of China (http://globeland30.org/). GlobeLand30 provides land cover data for 10 categories of ground objects, including artificial surfaces, glaciers, grasslands, waters, forests, fields, and bare land. The artificial surfaces category was used as the location of built-up areas in our study. GlobeLand30 has an average accuracy of over 80% and a Kappa index of 0.75(Chen et al. Citation2017; Ma et al. Citation2017), which can provide reliable guidance for built-up area. In addition, GlobeLand30 product has been widely used in urban land extraction by many recently published works (Dai et al. Citation2022; Ma et al. Citation2017; Wu, Zhao, and Jiang Citation2018).

The three types of images were cropped to the study area, and then the GlobeLand30 data was resized to the 10 m spatial resolution of the SDGSAT-1 image (23,040 × 32,100 pixels) to ensure that pixels at the same position in the three images corresponded to each other. All three data samples were cropped to 2800 patches with a size of 512 × 512 pixels. To augment the number of data samples, we increased these images from 2800 to 4200 by data augmentation, the strategies include random zoom (by 90% and 110%), random rotation (by 90°, 180°, and 270°), and random cropping. Of these 4200 images, 2,000 were randomly selected for training, 1000 for validation, and 1200 for testing. To ensure the reliability of the model test results, 1200 ground truth images used for testing were annotated by visual interpretation methods based on Sentinel-2 images.

4.2. Implementation details

This study was conducted using Pytorch on NVIDIA’s RTX3060. Considering the memory capacity of GPU devices, the batch size was set to four. To avoid the convergence instability caused by the small batch size, we adopted the strategy of adjusting the learning rate dynamically. Considering that the early convergence of the training model is relatively fast, the initial learning rate was set to 0.001. As the convergence rate of the model slows with the epoch increases, to avoid an unstable convergence caused by a high learning rate, the learning rate is decreased dynamically. Our setup is to multiply the learning rate by 0.1 for every 10 epochs during training. As shown in , the above learning rate strategy got the best training performance through trial and error, where our proposed model achieved the best accuracy after 40 epochs of training (a fast convergence between the 1st and 25th epochs, then slowed down until the 40th epoch). We chose Adam as the optimization function (Bechikh, Chaabani, and Ben Said Citation2015), which is a widely used optimization strategy and can dynamically adjust the learning rate to fit the training process (Khan et al. Citation2020).

Because the classification of the built-up area involves only a binary classification, we choose a binary cross entropy as the loss function. It can be used to measure the difference between the model’s predicted probability of each pixel belonging to the foreground (target) and background. The loss function was a binary cross-entropy (BCELoss) (Bai et al. Citation2022) calculated with equation (3): (3) BCELoss=1ni=1n[logp(yi)yi+log(1p(yi)(1yi))],(3) where yi is 0 or 1 and refers to the black (0) or white (1) pixels of the image; and p(yi) is the predicted value in the test sample.

4.3. Evaluation metrics

The metrics used to evaluate the performance of segmented built-up area include frame per second (FPS), precision, recall, F1-score (Huang et al. Citation2015), and the mean intersection over union (MIoU) (Behera, Rath, and Sethy Citation2021). The FPS is used to evaluate the computing efficiency of methods. The MIoU is the average result of the IoU, which was used to determine the proportion of predicted values that overlap with the ground truth and was calculated using equation (4). The F1-score, recall, and precision were calculated using equations (5)–(7). Recall measures the proportion of built-up area pixels correctly identified among all the built-up area pixels of ground truth. Precision measures the proportion of correctly identified built-up area pixels among all the pixels classified as built-up area by the model. (4) IoU=TPTP+TN+FP,(4) (5) F1measure=2TP2TP+FN+FP,(5) (6) Recall=TPTP+FN,(6) (7) Precision=TPTP+FP,(7) where TP is the number of correctly classified built-up area target pixels, TN is the number of correctly classified non-built-up-area pixels, and FP is the number of ground truth background pixels that are wrongly classified to be urban areas.

4.4. Ablation study

To assess the performance of the CFPA and CG modules, an ablation experiment was designed in which the CFPA and CG modules were removed from the proposed model, and evaluations were conducted using the same strategy. shows the ablation experiment results. With the CG and CFPA modules, the precision, recall, and F1 scores were 82.9%, 83.4%, and 83.1%, respectively. When the CG module was removed, the precision, recall, and F1 score decreased by 2.6%, 1.2%, and 1.9%, respectively. This result indicated that the CG module improved the extraction accuracy of the built-up areas. Similarly, when the CFPA module was removed from the proposed model, the precision, recall, and F1 score were reduced by 3.1%, 1.5%, and 2.3%, respectively. These findings demonstrate that the CFPA module improved the accuracy of built-up area extraction by enhancing the context recognition of the model.

Table 1. Results of an ablation experiment with the removal of the CFPA and CG modules.

4.5. The effect of auxiliary features

To evaluate the effects of auxiliary features, an alternative model was designed in which the auxiliary feature was removed and it was then compared with the proposed model. The test results are shown in . The method with the optical remote sensing images achieved a better performance in terms of MIoU, precision, recall, and F1 score, with increases of 1.9%, 0.8%, 2.7%, and 1.2%, respectively. The result indicated that the strategy of providing additional features to the model helped to improve the accuracy of built-up area extraction.

Table 2. Results with the inclusion of an auxiliary feature.

4.6. Comparison of different methods

The performance of the proposed model was compared with eight published classical models that employed semantic segmentation: UNet (Ronneberger, Fischer, and Brox Citation2015), SegNet (Zheng et al. Citation2021), Deeplabv3+ (Yu et al. Citation2022), ExtremeC3Net (Park et al. Citation2019), BiseNet (Kim and Jo Citation2022), SiNet (Hu et al. Citation2019), PSPNet (Long, Zhang, and Zhao Citation2020), and UNet++ (Zhou et al. Citation2020). Additionally, to prove the superiority of the proposed model, we also compared it with other two models proposed in recent years for building extraction, including Res2-UNet (Chen et al. Citation2022) and CBRNet (Guo et al. Citation2022), based on the same training and testing samples. shows the results of the comparison. The CG-CFPANet model achieved a better MIoU, precision, recall, and F1 score, with these metrics increased by at least 0.7%, 1.3%, 1.0%, and 1.2%, respectively, compared to the other models. To clearly illustrate the comparative performance of the different methods, the results of a visual comparison of two typical evaluation cases are presented in . The figure shows that the results of our proposed model were most consistent with the true value and could accurately distinguish the mountains around the town. In (a.13), our model could distinguish the streets that extend into the mountains. In (b.13), our model was able to distinguish between small forest areas within a block.

Figure 8. Visual results of the built-up area extraction, where (*). 1 indicates the optical remote sensing image patches used for testing; (*). 2 indicates is the NLT image patches used for testing; (*). 3 is the ground truth images; (*). 4 is the results determined by UNet++; (*). 5 is the results determined by SegNet; (*). 6 is the results determined by BiseNet; (*). 7 is the results determined by Deeplab V3+; (*). 8 is the results determined by PSPNet; (*). 9 is the results determined finding by UNet; (*). 10 is the results determined by SiNet; (*). 11 is the results determined by ExtermeC3Net; (*). 12 is the results determined by CBRNet; (*). 13 is the results determined by Res2-UNet; (*). 14 is the results determined by CG-CFPAN.

Figure 8. Visual results of the built-up area extraction, where (*). 1 indicates the optical remote sensing image patches used for testing; (*). 2 indicates is the NLT image patches used for testing; (*). 3 is the ground truth images; (*). 4 is the results determined by UNet++; (*). 5 is the results determined by SegNet; (*). 6 is the results determined by BiseNet; (*). 7 is the results determined by Deeplab V3+; (*). 8 is the results determined by PSPNet; (*). 9 is the results determined finding by UNet; (*). 10 is the results determined by SiNet; (*). 11 is the results determined by ExtermeC3Net; (*). 12 is the results determined by CBRNet; (*). 13 is the results determined by Res2-UNet; (*). 14 is the results determined by CG-CFPAN.

Table 3. Model comparisons for our test sample.

Figure 9. Visual comparison of ground truth and extraction results from all comparative models. (a) original images (b) results from UNet++; (c) results from SegNet; (d) results from BiseNet; (e) results from Deeplab V3+; (f) results from PSPNet; (g) results from UNet; (h) results from SiNet; (i) results from ExtermeC3Net; (j) results from CBRNet; (k) results from Res2-UNet; (l) results from CG-CFPANet.

Figure 9. Visual comparison of ground truth and extraction results from all comparative models. (a) original images (b) results from UNet++; (c) results from SegNet; (d) results from BiseNet; (e) results from Deeplab V3+; (f) results from PSPNet; (g) results from UNet; (h) results from SiNet; (i) results from ExtermeC3Net; (j) results from CBRNet; (k) results from Res2-UNet; (l) results from CG-CFPANet.

5. Discussion

Existing built-up area extraction methods make it difficult to extract accurate boundaries in complex scenes, as demonstrated by the limitations of the UNet method used by Hu et al. for built-up area extraction (Hu et al. Citation2023). To address these limitations, we proposed a new semantic segmentation learning framework with CG and CFPA modules, which can extract accurate urban built-up areas in a complex background. The CG module can synthesize different scales of feature maps of nighttime light images and optical remote sensing images in the encoder by continuous concatenation of different scales of feature maps, while the CFPA module can enhance contextual feature learning capability by resizing the feature maps to four different sizes and fusing the different scales of feature maps together. Therefore, our proposed CG-CFPANet model can extract built-up area robustly and accurately with stronger feature fusion and context recognition capacity. Our proposed CG-CFPANet model is a novel semantic segmentation module that can synthesize NTL image feature maps and optical remote sensing image feature maps.

As shown in , the built-up area boundaries extracted by our proposed CG-CFPANet are more intact than the comparing models of UNet, UNet++, PSPNet, DeeplabV3, FCN, ExtremeC3Net, SegNet, BiseNet, Res2-UNet, and CBRNet. From the comparing images in , the extracted result images of the ExtremeC3Net and SegNet models also showed clear urban boundaries, but with a 7.7% and 2.3% lower MIoU than that of our proposed CG-CFPANet model. This is mainly due to the lack of contextual recognition in the ExtremeC3Net and SegNet models. The pyramid pooling module in PSPNet is similar to the CFPA module in our proposed network. However, PSPNet only utilizes the feature map from the last layer of the encoder, without fully leveraging the feature maps from other parts of the encoder. Thus, the PSPNet predictions for urban areas were less accurate than those of CG-CFPANet, with 2.2% lower MIoU and 2.0% lower precision than CG-CFPANet, especially in the border areas (.c). UNet++ exhibits a notable feature fusion capability. However, when compared to CG-CFPANet, it falls short in terms of context recognition. Consequently, this deficiency in UNet++’s context recognition capability leads to a decrease in urban area prediction accuracy. For instance, as shown in (b.4), UNet++ mistakenly classifies a small forest within an urban area as part of the urban area. Our model can still perform well in complex scenes, such as group C of , where the river crossing the built-up area is wrongly identified as a built-up area by most models in comparison, and our proposed model can accurately recognize the river as a background. Although UNet can also correctly identify river as background, large quantities of built-up area are wrongly recognized as background by UNet. In group D of figure 8, a small quantity of bare land pixels near towns are wrongly identified as built-up area by SegNet, CBRNet and Res2-UNet. Although models such as UNet++, BiseNet, DeepLabV3+, PSPNet, UNet, and ExtermeC3Net, can correctly identify bare land, they can just extract the coarse boundaries. Our proposed model can extract built-up area with clearest boundaries (as shown in (d).14). In terms of small-scale built-up area extraction, SegNet, CBRNet, and Res2-UNet failed in group D of figure 8, and BiseNet and PSPNet can only extract parts. However, UNet++, Sinet, ExtermeC3Net, and our proposed CG-CFPANet can extract more intact small-scale built-up area. In addition, deeplab V3+, ExtermeC3Net and CBRNet have relatively more misclassification pixels (marked as yellow in ), indicating that these models mistakenly identify more backgrounds as built-up area. SegNet, UNet, and SiNet mistakenly identify more built-up area as background (marked as blue in ). Our proposed model can obviously reduce the misclassifications by extracting more intact contextual features from remote sensing images through our proposed CPFA module as shown in .

It needs to be pointed out that the implementing speed of our proposed model is lower than the other models with a smaller FPS in , but still can be recognized as acceptable with less than 0.5f/s than the widely used framework DeepLabv3+, but 2.3% higher MIoU. Res2-UNet has satisfactory performance in most scenarios, and it has excellent efficiency with 2.9f/s higher FPS than our proposed CG-CFPANet. However, CG-CFPANet surpasses Res2-UNet with 0.7% higher MIoU and 1.3% higher precision. With the highest extraction accuracy and satisfactory implementation efficiency, our proposed CG-CFPANet has strong applicability for practical urban areas extraction. Furthermore. In our future studies, we will focus on optimizing our proposed model structure to improve its implementation speed.

It is noted that our proposed CG-CFPANet model still faces challenges in distinguishing complex fields and bare land near built-up areas, with some background objects being incorrectly detected as built-up areas ((a)). Such issues can be further overcome in future research by expanding the variety of training samples used to construct the model. In addition, to demonstrate the robustness of our proposed approach, we are planning to enlarge the scale of our study area in the future study to improve and evaluate our proposed framework more objectively.

6. Conclusion

In this study, a two-branch semantic segmentation network (CG-CFPANet) was proposed to extract urban areas from a synthesized SDGSAT-1 NTL image and an optical remote sensing image. The CG and CFPA modules were deployed in the network to improve the model accuracy in extracting built-up area. The CG module improved the feature fusion and extraction capacities by fusing feature maps for different positions in the encoder and improved the precision by 2.9% in the evaluation experiments. The CFPA module added more convolutional layers to the pyramid pooling module to achieve a higher context recognition and feature learning capability. It improved the precision by 3.4% in the evaluation experiments. Compared to ten typical semantic segmentation frameworks, our proposed CG-CFPANet obtained at least a 0.7% higher MIoU in urban area extraction. The proposed multi-task framework enabled the effective extraction of urban areas from SDGSAT-1 NTL images. The method used in this study has important potential practical applications for large-scale urban area extraction.

Compared to the ten typical semantic segmentation frameworks, our proposed CG-CFPANet achieved a 0.7% higher MIoU compared to Res2-UNet, which is the best performing model among all the models in the comparison. Although the increase of 0.7% in MIoU may not seem that obvious, it is indeed a challenging task to further enhance the performance beyond the best model.

Moreover, although our proposed model doesn’t have the highest FPS, it is still an acceptable model with less than 0.5f/s than the widely used DeepLabv3 + . With the highest MIoU and F1 scores, CG-CFPANet has a satisfactory capacity for practical urban area extraction. In addition, in our future studies, the structure of the model will be optimized to improve the speed.

Acknowledgements

The authors thank the International Research Centre of Big Data for Sustainable Development Goals (CBAS) for providing SDGSAT-1 data. The authors thank the National Geomatics Center of China (NGCC) for providing GlobaLand30 land use data. The authors thank the anonymous reviewers for their valuable comments.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

The research was financially supported by the National Key R&D Program of China (No. 2022YFC3800701), the China-ASEAN Big Earth Data Platform and Applications (CADA, guikeAA20302022), and the Youth Innovation Promotion Association, CAS (2022122).

References

  • Abdollahi, A., B. Pradhan, and A. M. Alamri. 2022. “An Ensemble Architecture of Deep Convolutional Segnet and Unet Networks for Building Semantic Segmentation from High-Resolution Aerial Images.” Geocarto International 37 (12): 3355–3370. https://doi.org/10.1080/10106049.2020.1856199.
  • Bai, Z., J. Wang, X. L. Zhang, and J. Chen. 2022. “End-to-End Speaker Verification via Curriculum Bipartite Ranking Weighted Binary Cross-Entropy.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 30: 1330–1344. https://doi.org/10.1109/TASLP.2022.3161155.
  • Bechikh, S., A. Chaabani, and L. Ben Said. 2015. “An Efficient Chemical Reaction Optimization Algorithm for Multiobjective Optimization.” IEEE Transactions on Cybernetics 45 (10): 2051–2064. https://doi.org/10.1109/TCYB.2014.2363878.
  • Behera, S. K., A. K. Rath, and P. K. Sethy. 2021. “Fruits Yield Estimation Using Faster R-CNN with MIoU.” Multimedia Tools and Applications 80 (12): 19043–19056. https://doi.org/10.1007/s11042-021-10704-7.
  • Cai, W., and D. Hu. 2020. “QRS Complex Detection Using Novel Deep Learning Neural Networks.” IEEE Access 8: 97082–97089. https://doi.org/10.1109/ACCESS.2020.2997473.
  • Chen, J., X. Cao, S. Peng, and H. R. Ren. 2017. “Analysis and Applications of GlobeLand30: A Review.” ISPRS International Journal of Geo-Information 6 (8): 230. https://doi.org/10.3390/ijgi6080230.
  • Chen, F., J. X. Wang, B. Li, A. Q. Yang, and M. M. Zhang. 2023. “Spatial Variability in Melting on Himalayan Debris-Covered Glaciers from 2000 to 2013.” Remote Sensing of Environment 291: 113560. https://doi.org/10.1016/j.rse.2023.113560.
  • Chen, F., N. Wang, B. Yu, and L. Wang. 2022. “Res2-Unet, a New Deep Architecture for Building Detection from High Spatial Resolution Images.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15: 1494–1501. https://doi.org/10.1109/JSTARS.2022.3146430.
  • Chen, X. X., F. Zhang, Z. H. Du, and R. I. Liu. 2020. “An Unsupervised Urban Extent Extraction Method from NPP-VIIRS Nighttime Light Data.” Remote Sensing 12 (22): 3810. https://doi.org/10.3390/rs12223810.
  • Choi, T., and C. Cao. 2019. “NOAA-20 VIIRS Relative Spectral Response Effects on Solar Diffuser Degradation and On-Orbit Radiometric Calibration.” IEEE Transactions on Geoscience and Remote Sensing 60: 1–7. https://doi.org/10.1109/TGRS.2021.3101695.
  • Choi, T., and C. Cao. 2019. “S-NPP VIIRS On-Orbit Calibration Coefficient Improvements with Yaw Maneuver Reanalysis.” IEEE Transactions on Geoscience and Remote Sensing 57 (10): 7460–7465. https://doi.org/10.1109/TGRS.2019.2913502.
  • Cui, Y., H. Zha, L. Jiang, M. Zhang, and K. Shi. 2023. “Luojia 1–01 Data Outperform Suomi-NPP VIIRS Data in Estimating CO2 Emissions in the Service, Industrial, and Urban Residential Sectors.” IEEE Geoscience and Remote Sensing Letters 20: 1–5. https://doi.org/10.1109/LGRS.2023.3244931.
  • Dai, X. L., J. F. Jin, Q. H. Chen, and X. Fang. 2022. “On Physical Urban Boundaries, Urban Sprawl, and Compactness Measurement: A Case Study of the Wen-Tai Region, China.” Land 11 (10): 1637. https://doi.org/10.3390/land11101637.
  • Duan, Y., J. Wang, H. Ma, and Y. Sun. 2022. “Residual Convolutional Graph Neural Network with Subgraph Attention Pooling.” Tsinghua Science and Technology 27 (4): 653–663. https://doi.org/10.26599/TST.2021.9010058.
  • Gong, P., X. C. Li, J. Wang, Y. Q. Bai, B. Chen, T. Y. Hu, X. P. Liu, et al. 2020. “Annual Maps of Global Artificial Impervious Area (GAIA) between 1985 and 2018.” Remote Sensing of Environment 236: 111510. https://doi.org/10.1016/j.rse.2019.111510.
  • Guo, H., H. Chen, L. Chen, and F. U. Bihong. 2020. “Progress on CASEarth Satellite Development.” Chinese Journal of Space Science 40 (5): 707–717. https://doi.org/10.11728/cjss2020.05.707.
  • Guo, H. N., B. Du, L. P. Zhang, and X. Su. 2022. “A Coarse-to-Fine Boundary Refinement Network for Building Footprint Extraction from Remote Sensing Imagery.” ISPRS Journal of Photogrammetry and Remote Sensing 183: 240–252. https://doi.org/10.1016/j.isprsjprs.2021.11.005.
  • Hu, P., J. H. Cheng, Ping Li, and Yuyao Wang. 2023. “Automatic Extraction of Built-Up Area in Chinese Urban Agglomerations Based on the Deep Learning Method Using NTL Data.” Geocarto International 38 (1): 2246939. https://doi.org/10.1080/10106049.2023.2246939.
  • Hu, X., X. Xu, Y. Xiao, H. Chen, S. He, J. Qin, and P. A. Heng. 2019. “SINet: A Scale-Insensitive Convolutional Neural Network for Fast Vehicle Detection.” IEEE Transactions on Intelligent Transportation Systems 20 (3): 1010–1019. https://doi.org/10.1109/TITS.2018.2838132.
  • Huang, X., Q. Li, H. Liu, and J. Li. 2016. “Assessing and Improving the Accuracy of GlobeLand30 Data for Urban Area Delineation by Combining Multisource Remote Sensing Data.” IEEE Geoscience and Remote Sensing Letters 13 (12): 1860–1864. https://doi.org/10.1109/LGRS.2016.2615318.
  • Huang, H., H. Xu, X. Wang, and W. Silamu. 2015. “Maximum F1-Score Discriminative Training Criterion for Automatic Mispronunciation Detection.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 23 (4): 787–797. https://doi.org/10.1109/TASLP.2015.2409733.
  • Jawarneh, I. M. A., P. Bellavista, A. Corradi, L. Foschini, R. Montanari, J. Berrocal, and J. M. Murillo. 2020. “A Pre-Filtering Approach for Incorporating Contextual Information Into Deep Learning Based Recommender Systems.” IEEE Access 8: 40485–40498. https://doi.org/10.1109/ACCESS.2020.2975167.
  • Karabayir, I., O. Akbilgic, and N. Tas. 2021. “A Novel Learning Algorithm to Optimize Deep Neural Networks: Evolved Gradient Direction Optimizer (EVGO).” IEEE Transactions on Neural Networks and Learning Systems 32 (2): 685–694. https://doi.org/10.1109/TNNLS.2020.2979121.
  • Ke, Z., C. Le, and Y. Yao. 2020. “A Multivariate Grey Incidence Model for Different Scale Data Based on Spatial Pyramid Pooling.” Journal of Systems Engineering and Electronics 31 (4): 770–779. https://doi.org/10.23919/JSEE.2020.000052.
  • Khan, A. H., X. Cao, S. Li, V. N. Katsikis, and L. Liao. 2020. “BAS-ADAM: An ADAM Based Approach to Improve the Performance of Beetle Antennae Search Optimizer.” IEEE/CAA Journal of Automatica Sinica 7 (2): 461–471. https://doi.org/10.1109/JAS.2020.1003048.
  • Kim, Seongmin, and Kanghyun Jo. 2022. “BiSeNet with Depthwise Attention Spatial Path for Semantic Segmentation.” Paper presented at the 2022 International Workshop on Intelligent Systems (IWIS), Ulsan, Korea, August 2022. https://doi.org/10.1109/IWIS56333.2022.9920717.
  • Li, C. R., F. Chen, N. Wang, B. Yu, and L. Wang. 2023. “SDGSAT-1 Nighttime Light Data Improve Village-Scale Built-up Delineation.” Remote Sensing of Environment 297: 113764. https://doi.org/10.1016/j.rse.2023.113764.
  • Li, X. C., P. Gong, and L. Liang. 2015. “A 30-Year (1984–2013) Record of Annual Urban Dynamics of Beijing City Derived from Landsat Data.” Remote Sensing of Environment 166: 78–90. https://doi.org/10.1016/j.rse.2015.06.007.
  • Li, L., X. Li, X. Liu, W. Huang, Z. Hu, and F. Chen. 2021. “Attention Mechanism Cloud Detection with Modified FCN for Infrared Remote Sensing Images.” IEEE Access 9: 150975–150983. https://doi.org/10.1109/ACCESS.2021.3122162.
  • Li, H. C., P. F. Xiong, J. An, and L. X. Wang. 2018. “Pyramid Attention Network for Semantic Segmentation.” Computer Science. https://doi.org/10.48550/arXiv.1805.10180
  • Li, F., Q. W. Yan, Z. F. Bian, B. L. Liu, and Z. H. Wu. 2020. “A POI and LST Adjusted NTL Urban Index for Urban Built-Up Area Extraction.” Sensors 20 (10): 2918. https://doi.org/10.3390/s20102918.
  • Li, W., Z. Zhao, A. A. Liu, Z. Gao, C. Yan, Z. Mao, H. Chen, and W. Nie. 2022. “Joint Local Correlation and Global Contextual Information for Unsupervised 3D Model Retrieval and Classification.” IEEE Transactions on Circuits and Systems for Video Technology 32 (5): 3265–3278. https://doi.org/10.1109/TCSVT.2021.3099496.
  • Liang, L., T. Huang, L. Di, D. Geng, J. Yan, S. Wang, L. Wang, L. Li, B. Chen, and J. Kang. 2020. “Influence of Different Bandwidths on LAI Estimation Using Vegetation Indices.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13: 1494–1502. https://doi.org/10.1109/JSTARS.2020.2984608.
  • Lin, B., G. Yang, Q. Zhang, and G. Zhang. 2022. “Semantic Segmentation Network Using Local Relationship Upsampling for Remote Sensing Images.” IEEE Geoscience and Remote Sensing Letters 19: 1–5. https://doi.org/10.1109/LGRS.2020.3047443.
  • Liu, X. P., G. H. Hu, Y. M. Chen, X. Li, X. C. Xu, S. M. Li, F. S. Pei, and S. Wang. 2018. “High-Resolution Multi-Temporal Mapping of Global Urban Land Using Landsat Images Based on the Google Earth Engine Platform.” Remote Sensing of Environment 209: 227–239. https://doi.org/10.1016/j.rse.2018.02.055.
  • Liu, C., K. Yang, M. M. Bennett, Z. Y. Guo, L. Cheng, and M. C. Li. 2019. “Automated Extraction of Built-Up Areas by Fusing VIIRS Nighttime Lights and Landsat-8 Data.” Remote Sensing 11 (13): 1571. https://doi.org/10.3390/rs11131571.
  • Long, X., W. Zhang, and B. Zhao. 2020. “PSPNet-SLAM: A Semantic SLAM Detect Dynamic Object by Pyramid Scene Parsing Network.” IEEE Access 8: 214685–214695. https://doi.org/10.1109/ACCESS.2020.3041038.
  • Lu, X. J., X. Duan, X. P. Mao, Y. Y. Li, and X. D. Zhang. 2017. “Feature Extraction and Fusion Using Deep Convolutional Neural Networks for Face Detection.” Mathematical Problems in Engineering 2017: 1376726. https://doi.org/10.1155/2017/1376726.
  • Lv, Z., P. Zhang, W. Sun, J. A. Benediktsson, J. Li, and W. Wang. 2023. “Novel Adaptive Region Spectral–Spatial Features for Land Cover Classification with High Spatial Resolution Remotely Sensed Imagery.” IEEE Transactions on Geoscience and Remote Sensing 61: 1–12. https://doi.org/10.1109/TGRS.2023.3275753.
  • Lv, Z., P. Zhong, W. Wang, Z. You, J. A. Benediktsson, and C. Shi. 2023. “Novel Piecewise Distance Based on Adaptive Region Key-Points Extraction for LCCD with VHR Remote-Sensing Images.” IEEE Transactions on Geoscience and Remote Sensing 61: 1–9. https://doi.org/10.1109/TGRS.2023.3268038.
  • Lv, Z., P. Zhong, W. Wang, Z. You, and N. Falco. 2023. “Multiscale Attention Network Guided with Change Gradient Image for Land Cover Change Detection Using Remote Sensing Images.” IEEE Geoscience and Remote Sensing Letters 20: 1–5. https://doi.org/10.1109/LGRS.2023.3267879.
  • Ma, K. Y., and C. I. Chang. 2022. “Kernel-Based Constrained Energy Minimization for Hyperspectral Mixed Pixel Classification.” IEEE Transactions on Geoscience and Remote Sensing 60: 1–23. https://doi.org/10.1109/TGRS.2021.3085801.
  • Ma, X. L., X. H. Tong, S. C. Liu, X. Luo, H. Xie, and C. M. Li. 2017. “Optimized Sample Selection in SVM Classification by Combining with DMSP-OLS, Landsat NDVI and GlobeLand30 Products for Extracting Urban Built-Up Areas.” Remote Sensing 9 (3): 236. https://doi.org/10.3390/rs9030236.
  • McGauley, M. G., and D. S. Nolan. 2011. “Measuring Environmental Favorability for Tropical Cyclogenesis by Statistical Analysis of Threshold Parameters.” Journal of Climate 24 (23): 5968–5997. https://doi.org/10.1175/2011JCLI4176.1.
  • Noreen, N., S. Palaniappan, A. Qayyum, I. Ahmad, M. Imran, and M. Shoaib. 2020. “A Deep Learning Model Based on Concatenation Approach for the Diagnosis of Brain Tumor.” IEEE Access 8: 55135–55144. https://doi.org/10.1109/ACCESS.2020.2978629.
  • Park, H., L. L. Sjsund, Y. J. Yoo, and N. Kwak. 2019. “ExtremeC3Net: Extreme Lightweight Portrait Segmentation Networks Using Advanced C3-Modules.” Computer Science. https://doi.org/10.48550/arXiv.1908.03093.
  • Qassim, H., A. Verma, and D. Feinzimer. 2018. “Compressed Residual-VGG16 CNN Model for big Data Places Image Recognition.” 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC).
  • Qi, L., L. Li, X. Ni, X. Zhou, and F. Chen. 2022. “On-Orbit Spatial Quality Evaluation of SDGSAT-1 Thermal Infrared Spectrometer.” IEEE Geoscience and Remote Sensing Letters 19: 1–5. https://doi.org/10.1109/LGRS.2022.3200209.
  • Ronneberger, O., P. Fischer, and T. Brox. 2015. “U-Net: Convolutional Networks for Biomedical Image Segmentation.” Paper presented at the Medical Image Computing and Computer Assisted Intervention, MICCAI, Cham, November, 2015. https://doi.org/10.1007/978-3-319-24574-4_28.
  • Schneider, A., M. A. Friedl, and D. Potere. 2009. “A New Map of Global Urban Extent from MODIS Satellite Data.” Environmental Research Letters 4 (4): 044003. https://doi.org/10.1088/1748-9326/4/4/044003.
  • Shi, H., M. Xu, and R. Li. 2018. “Deep Learning for Household Load Forecasting—A Novel Pooling Deep RNN.” IEEE Transactions on Smart Grid 9 (5): 5271–5280. https://doi.org/10.1109/TSG.2017.2686012.
  • Song, S., B. L. Yu, J. P. Wu, and H. X. Liu. 2011. “Methods for Deriving Urban Built-Up Area Using Night-Light Data: Assessment and Application.” Remote Sensing Technology and Application 26 (2): 169–176. https://doi.org/10.11873/j.issn.1004-0323.2011.2.169.
  • Sun, L. S., Y. H. Han, Z. W. Xie, and R. R. Li. 2020. “Neighbourhood Extremum Method of Extracting Urban Built-Up Area Using Nighttime Lighting Data.” Geomatics and Information Science of Wuhan University 45 (10): 1619. https://doi.org/10.13203/j.whugis20190010.
  • Tan, Y. H., S. Z. Xiong, and P. Yan. 2020. “Multi-branch Convolutional Neural Network for Built-up Area Extraction from Remote Sensing Image.” Neurocomputing 396: 358–374. https://doi.org/10.1016/j.neucom.2018.09.106.
  • Theckedath, D., and R. R. Sedamkar. 2020. “Detecting Affect States Using VGG16, ResNet50 and SE-ResNet50 Networks.” SN Computer Science 1 (2): 79. https://doi.org/10.1007/s42979-020-0114-9.
  • Wang, P., P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell. 2018. “Understanding Convolution for Semantic Segmentation.” Paper presented at the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, March 2018. https://doi.org/10.1109/WACV.2018.00163.
  • Wang, L., Y. H. Jia, X. H. Li, and P. Gong. 2020. “Analysing the Driving Forces and Environmental Effects of Urban Expansion by Mapping the Speed and Acceleration of Built-Up Areas in China between 1978 and 2017.” Remote Sensing 12 (23): 3929. https://doi.org/10.3390/rs12233929.
  • Wu, W. J., H. R. Zhao, and S. L. Jiang. 2018. “A Zipf’s Law-Based Method for Mapping Urban Areas Using NPP-VIIRS Nighttime Light Data.” Remote Sensing 10 (1): 130. https://doi.org/10.3390/rs10010130.
  • Xu, Y. Y., L. Wu, Z. Xie, and Z. L. Chen. 2018. “Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters.” Remote Sensing 10 (1): 144. https://doi.org/10.3390/rs10010144.
  • Yu, B., F. Chen, N. Wang, L. Wang, and H. D. Guo. 2023a. “Assessing Changes in Nighttime Lighting in the Aftermath of the Turkey-Syria Earthquake Using SDGSAT-1 Satellite Data.” The Innovation 4 (3): 100419. https://doi.org/10.1016/j.xinn.2023.100419.
  • Yu, B., F. Chen, C. Ye, Z. W. Li, Y. Dong, N. Wang, and L. Wang. 2023b. “Temporal Expansion of the Nighttime Light Images of SDGSAT-1 Satellite in Illuminating Ground Object Extraction by Joint Oobservation of NPP-VIIRS and Sentinel-2A Images.” Remote Sensing of Environment 295: 113691. https://doi.org/10.1016/j.rse.2023.113691.
  • Yu, B., C. Xu, F. Chen, N. Wang, and L. Wang. 2022. “HADeenNet: A Hierarchical-Attention Multi-Scale Deconvolution Network for Landslide Detection.” International Journal of Applied Earth Observation and Geoinformation 111: 102853. https://doi.org/10.1016/j.jag.2022.102853.
  • Yu, B., L. Yang, and F. Chen. 2018. “Semantic Segmentation for High Spatial Resolution Remote Sensing Images Based on Convolution Neural Network and Pyramid Pooling Module.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11 (9): 3252–3261. https://doi.org/10.1109/JSTARS.2018.2860989.
  • Yu, B., A. Q. Yang, F. Chen, N. Wang, and L. Wang. 2022. “SNNFD, Spiking Neural Segmentation Network in Frequency Domain Using High Spatial Resolution Images for Building Extraction.” International Journal of Applied Earth Observation and Geoinformation 112: 102930. https://doi.org/10.1016/j.jag.2022.102930.
  • Yu, L., Z. Zeng, A. Liu, X. Xie, H. Wang, F. Xu, and W. Hong. 2022. “A Lightweight Complex-Valued DeepLabv3+ for Semantic Segmentation of PolSAR Image.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15: 930–943. https://doi.org/10.1109/JSTARS.2021.3140101.
  • Zadeh, M. R. D., M. Fathian, and M. R. Gholamian. 2014. “A New Method for Clustering Based on Development of Imperialist Competitive Algorithm.” China Communications 11 (12): 54–61. https://doi.org/10.1109/CC.2014.7019840.
  • Zhang, D. G., B. Cheng, L. Shi, J. Gao, T. F. Long, B. Chen, and G. Z. Wang. 2022. “A Destriping Algorithm for SDGSAT-1 Nighttime Light Images Based on Anomaly Detection and Spectral Similarity Restoration.” Remote Sensing 14 (21): 5544. https://doi.org/10.3390/rs14215544.
  • Zhao, Y., G. Sun, L. Zhang, A. Zhang, X. Jia, and Z. Han. 2023. “MSRF-Net: Multiscale Receptive Field Network for Building Detection from Remote Sensing Images.” IEEE Transactions on Geoscience and Remote Sensing 61: 1–18. https://doi.org/10.1109/TGRS.2023.3282926.
  • Zheng, X., S. Zhang, X. Li, G. Li, and X. Li. 2021. “Lightweight Bridge Crack Detection Method Based on SegNet and Bottleneck Depth-Separable Convolution with Residuals.” IEEE Access 9: 161649–161668. https://doi.org/10.1109/ACCESS.2021.3133712.
  • Zhou, Z., M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang. 2020. “UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation.” IEEE Transactions on Medical Imaging 39 (6): 1856–1867. https://doi.org/10.1109/TMI.2019.2959609.
  • Zhu, Z., Y. Y. Zhou, K. C. Seto, E. C. Stokes, C. B. Deng, S. T. A. Pickett, and H. Taubenböck. 2019. “Understanding an Urbanizing Planet: Strategic Directions for Remote Sensing.” Remote Sensing of Environment 228: 164–182. https://doi.org/10.1016/j.rse.2019.04.020.