A method for building extraction in remote sensing images based on swintransformer

Weidong Zhua School of Marine Science and Ecological Environment, Shanghai Ocean University, Shanghai, People’s Republic of China;b Shanghai Estuary Marine Surveying and Mapping Engineering Technology Research Center, Shanghai, People’s Republic of China;c Key Laboratory of Marine Ecological Monitoring and Restoration Technologies, Shanghai, People’s Republic of ChinaView further author information

Xiaolong Zhua School of Marine Science and Ecological Environment, Shanghai Ocean University, Shanghai, People’s Republic of ChinaCorrespondence[email protected]
View further author information

Naiying Hea School of Marine Science and Ecological Environment, Shanghai Ocean University, Shanghai, People’s Republic of China;b Shanghai Estuary Marine Surveying and Mapping Engineering Technology Research Center, Shanghai, People’s Republic of ChinaView further author information

Yuelin Xua School of Marine Science and Ecological Environment, Shanghai Ocean University, Shanghai, People’s Republic of ChinaView further author information

Tiantian Caoa School of Marine Science and Ecological Environment, Shanghai Ocean University, Shanghai, People’s Republic of ChinaView further author information

Yifei Lia School of Marine Science and Ecological Environment, Shanghai Ocean University, Shanghai, People’s Republic of ChinaView further author information

Yanying Huanga School of Marine Science and Ecological Environment, Shanghai Ocean University, Shanghai, People’s Republic of ChinaView further author information

show all

ABSTRACT

Remote sensing image building segmentation, which is essential in land use and urban planning, is evolving with advancements in deep learning. Conventional methods using convolutional neural networks face limitations in integrating local and global information and establishing long-range dependencies, resulting in suboptimal segmentation in complex scenarios. This paper proposes LMSwin_PNet, a novel segmentation network that addresses the SwinTransformer encoder's deficiency in local information processing through a local feature extraction module. Additionally, it features a multiscale nonparametric merging attention module to enhance feature-channel correlations. The network also incorporates the pyramid large-kernel convolution module, replacing the traditional 3 × 3 convolution in the decoder with multibranch large-kernel convolution, thereby achieving a large receptive field and detailed information capture. Comparative analyses on three public building datasets demonstrated the model's superior segmentation performance and robustness. The results show that LMSwin_PNet produced outputs closely matching labels, showing its potential for broader application in remote sensing image segmentation tasks. It achieved achieving an IoU of 72.35% on the Massachusetts Building Dataset, 91.30% on the WHU Building Dataset, and 78.99% on the Inria aerial-image building dataset. The source code will be freely available at https://github.com/ziyanpeng/pzy.

KEYWORDS:

Acknowledgments

We would like to thank the anonymous reviewers for their constructive and valuable suggestions on earlier drafts of this manuscript.

Author contributions

W.Z. and X.Z. designed and completed the experiments and wrote the paper. N.H. revised the paper and analyzed the data. Y.X., T.C. supervised the study. Y.L. and Y.H. guided the process and helped with the writing of the paper. All authors have read and agreed to the published version of the manuscript.

Data availability statement

The data used in this study are from open-source datasets. The datasets can be downloaded from Road and Building Detection Datasets (toronto.edu), https://gpcv.whu.edu.cn/data/building_dataset.html and Download – Inria Aerial Image Labeling Dataset (accessed on 17 January 2024).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This research was supported by the National Natural Science Foundation of China (grant number 42371441) and the Scientific Innovation Program Project by the Shanghai Committee of Science and Technology (grant number 20dz1206501).

A method for building extraction in remote sensing images based on swintransformer

Information for

Open access

Opportunities

Help and information

A method for building extraction in remote sensing images based on swintransformer

ABSTRACT

Acknowledgments

Author contributions

Data availability statement

Disclosure statement

Additional information

Funding

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature