ABSTRACT
Remote sensing image building segmentation, which is essential in land use and urban planning, is evolving with advancements in deep learning. Conventional methods using convolutional neural networks face limitations in integrating local and global information and establishing long-range dependencies, resulting in suboptimal segmentation in complex scenarios. This paper proposes LMSwin_PNet, a novel segmentation network that addresses the SwinTransformer encoder's deficiency in local information processing through a local feature extraction module. Additionally, it features a multiscale nonparametric merging attention module to enhance feature-channel correlations. The network also incorporates the pyramid large-kernel convolution module, replacing the traditional 3 × 3 convolution in the decoder with multibranch large-kernel convolution, thereby achieving a large receptive field and detailed information capture. Comparative analyses on three public building datasets demonstrated the model's superior segmentation performance and robustness. The results show that LMSwin_PNet produced outputs closely matching labels, showing its potential for broader application in remote sensing image segmentation tasks. It achieved achieving an IoU of 72.35% on the Massachusetts Building Dataset, 91.30% on the WHU Building Dataset, and 78.99% on the Inria aerial-image building dataset. The source code will be freely available at https://github.com/ziyanpeng/pzy.
Acknowledgments
We would like to thank the anonymous reviewers for their constructive and valuable suggestions on earlier drafts of this manuscript.
Author contributions
W.Z. and X.Z. designed and completed the experiments and wrote the paper. N.H. revised the paper and analyzed the data. Y.X., T.C. supervised the study. Y.L. and Y.H. guided the process and helped with the writing of the paper. All authors have read and agreed to the published version of the manuscript.
Data availability statement
The data used in this study are from open-source datasets. The datasets can be downloaded from Road and Building Detection Datasets (toronto.edu), https://gpcv.whu.edu.cn/data/building_dataset.html and Download – Inria Aerial Image Labeling Dataset (accessed on 17 January 2024).
Disclosure statement
No potential conflict of interest was reported by the author(s).