191
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A method for building extraction in remote sensing images based on swintransformer

, , , , , & show all
Article: 2353113 | Received 06 Dec 2023, Accepted 04 May 2024, Published online: 15 May 2024
 

ABSTRACT

Remote sensing image building segmentation, which is essential in land use and urban planning, is evolving with advancements in deep learning. Conventional methods using convolutional neural networks face limitations in integrating local and global information and establishing long-range dependencies, resulting in suboptimal segmentation in complex scenarios. This paper proposes LMSwin_PNet, a novel segmentation network that addresses the SwinTransformer encoder's deficiency in local information processing through a local feature extraction module. Additionally, it features a multiscale nonparametric merging attention module to enhance feature-channel correlations. The network also incorporates the pyramid large-kernel convolution module, replacing the traditional 3 × 3 convolution in the decoder with multibranch large-kernel convolution, thereby achieving a large receptive field and detailed information capture. Comparative analyses on three public building datasets demonstrated the model's superior segmentation performance and robustness. The results show that LMSwin_PNet produced outputs closely matching labels, showing its potential for broader application in remote sensing image segmentation tasks. It achieved achieving an IoU of 72.35% on the Massachusetts Building Dataset, 91.30% on the WHU Building Dataset, and 78.99% on the Inria aerial-image building dataset. The source code will be freely available at https://github.com/ziyanpeng/pzy.

This article is part of the following collections:
Integration of Advanced Machine/Deep Learning Models and GIS

Acknowledgments

We would like to thank the anonymous reviewers for their constructive and valuable suggestions on earlier drafts of this manuscript.

Author contributions

W.Z. and X.Z. designed and completed the experiments and wrote the paper. N.H. revised the paper and analyzed the data. Y.X., T.C. supervised the study. Y.L. and Y.H. guided the process and helped with the writing of the paper. All authors have read and agreed to the published version of the manuscript.

Data availability statement

The data used in this study are from open-source datasets. The datasets can be downloaded from Road and Building Detection Datasets (toronto.edu), https://gpcv.whu.edu.cn/data/building_dataset.html and Download – Inria Aerial Image Labeling Dataset (accessed on 17 January 2024).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This research was supported by the National Natural Science Foundation of China (grant number 42371441) and the Scientific Innovation Program Project by the Shanghai Committee of Science and Technology (grant number 20dz1206501).