Search in:

Advanced search

International Journal of Digital Earth Volume 16, 2023 - Issue 2

Submit an article Journal homepage

Open access

459

Views

CrossRef citations to date

Altmetric

Research Article

MC-Net: multi-scale contextual information aggregation network for image captioning on remote sensing images

Haiyan Huanga State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, People's Republic of China

https://orcid.org/0000-0002-9931-9884 View further author information

Zhenfeng Shaoa State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, People's Republic of ChinaCorrespondence[email protected]

https://orcid.org/0000-0003-4587-6826 View further author information

Qimin Chengb School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, People's Republic of ChinaView further author information

Xiao Huangc Department of Geosciences, University of Arkansas, Fayetteville, USA

https://orcid.org/0000-0002-4323-382X View further author information

Xiaoping Wud School of Geography and Resources Science, Sichuan Normal University, Sichuan, People's Republic of ChinaView further author information

Guoming Lie School of Resources and Environment, University of Electronic Science and Technology, Sichuan, People's Republic of ChinaView further author information

Li Tanf School of Geophysics, Chengdu University of Technology, Sichuan, People's Republic of ChinaView further author information

show all

Pages 4848-4866 | Received 28 Aug 2023, Accepted 08 Nov 2023, Published online: 28 Nov 2023

Cite this article
https://doi.org/10.1080/17538947.2023.2283482
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Anderson, Peter, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. “Spice: Semantic Propositional Image Caption Evaluation.” In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part V 14, 382–398. Springer.
Google Scholar
Anderson, Peter, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. “Bottom-up and Top-down Attention for Image Captioning and Visual Question Answering.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6077–6086.
Google Scholar
Aneja, Jyoti, Aditya Deshpande, and Alexander G. Schwing. 2018. “Convolutional Image Captioning.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5561–5570.
Google Scholar
Banerjee, Satanjeev, and Alon Lavie. 2005. “METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments.” In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 65–72.
Google Scholar
Cheng, Qimin, Haiyan Huang, Yuan Xu, Yuzhuo Zhou, Huanying Li, and Zhongyuan Wang. 2022. “NWPU-Captions Dataset and MLCA-Net for Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 60:1–19.
Web of Science ®Google Scholar
Cheng, Qimin, Haiyan Huang, Lan Ye, Peng Fu, Deqiao Gan, and Yuzhuo Zhou. 2021. “A Semantic-Preserving Deep Hashing Model for Multi-Label Remote Sensing Image Retrieval.” Remote Sensing 13 (24): 4965. https://doi.org/10.3390/rs13244965.
Web of Science ®Google Scholar
Du, Shouji, Shihong Du, Bo Liu, and Xiuyuan Zhang. 2021. “Incorporating DeepLabv3+ and Object-based Image Analysis for Semantic Segmentation of Very High Resolution Remote Sensing Images.” International Journal of Digital Earth 14 (3): 357–378. https://doi.org/10.1080/17538947.2020.1831087.
Web of Science ®Google Scholar
Farhadi, Ali, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. “Every Picture Tells a Story: Generating Sentences from Images.” In Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV 11, 15–29. Springer.
Google Scholar
Feng, Jiangfan, Panyu Chen, Zhujun Gu, Maimai Zeng, and Wei Zheng. 2023. “MDSNet: A Multiscale Decoupled Supervision Network for Semantic Segmentation of Remote Sensing Images.” International Journal of Digital Earth 16 (1): 2844–2861. https://doi.org/10.1080/17538947.2023.2241435.
Web of Science ®Google Scholar
Fu, Kun, Yang Li, Wenkai Zhang, Hongfeng Yu, and Xian Sun. 2020. “Boosting Memory with a Persistent Memory Mechanism for Remote Sensing Image Captioning.” Remote Sensing 12 (11): 1874. https://doi.org/10.3390/rs12111874.
Web of Science ®Google Scholar
Hoxha, Genc, and Farid Melgani. 2020. “Remote Sensing Image Captioning with SVM-based Decoding.” In IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium, 6734–6737. IEEE.
Google Scholar
Hoxha, Genc, and Farid Melgani. 2022. “A Novel SVM-Based Decoder for Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 60:1–14.
Google Scholar
Hoxha, Genc, Farid Melgani, and Begüm Demir. 2020. “Toward Remote Sensing Image Retrieval Under a Deep Image Captioning Perspective.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13:4462–4475. https://doi.org/10.1109/JSTARS.4609443.
Web of Science ®Google Scholar
Hoxha, Genc, Giacomo Scuccato, and Farid Melgani. 2023. “Improving Image Captioning Systems With Postprocessing Strategies.” IEEE Transactions on Geoscience and Remote Sensing 61:1–13. https://doi.org/10.1109/TGRS.2023.3281334.
Web of Science ®Google Scholar
Huang, Wei, Qi Wang, and Xuelong Li. 2020. “Denoising-Based Multiscale Feature Fusion for Remote Sensing Image Captioning.” IEEE Geoscience and Remote Sensing Letters 18 (3): 436–440. https://doi.org/10.1109/LGRS.8859.
Web of Science ®Google Scholar
Kulkarni, Girish, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and Tamara L. Berg. 2013. “Babytalk: Understanding and Generating Simple Image Descriptions.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (12): 2891–2903. https://doi.org/10.1109/TPAMI.2012.162.
PubMed Web of Science ®Google Scholar
Li, Yunpeng, Xiangrong Zhang, Jing Gu, Chen Li, Xin Wang, Xu Tang, and Licheng Jiao. 2021. “Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 60:1–16.
Web of Science ®Google Scholar
Li, Xuelong, Xueting Zhang, Wei Huang, and Qi Wang. 2020. “Truncation Cross Entropy Loss for Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 59 (6): 5246–5257. https://doi.org/10.1109/TGRS.2020.3010106.
Web of Science ®Google Scholar
Liu, Qingrong, Chengqing Ruan, Shan Zhong, Jian Li, Zhonghui Yin, and Xihu Lian. 2018. “Risk Assessment of Storm Surge Disaster Based on Numerical Models and Remote Sensing.” International Journal of Applied Earth Observation and Geoinformation 68:20–30. https://doi.org/10.1016/j.jag.2018.01.016.
Web of Science ®Google Scholar
Lu, Xiaoqiang, Binqiang Wang, and Xiangtao Zheng. 2019. “Sound Active Attention Framework for Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 58 (3): 1985–2000. https://doi.org/10.1109/TGRS.36.
Web of Science ®Google Scholar
Lu, Xiaoqiang, Binqiang Wang, Xiangtao Zheng, and Xuelong Li. 2017. “Exploring Models and Data for Remote Sensing Image Caption Generation.” IEEE Transactions on Geoscience and Remote Sensing 56 (4): 2183–2195. https://doi.org/10.1109/TGRS.2017.2776321.
Web of Science ®Google Scholar
Lu, Jiasen, Caiming Xiong, Devi Parikh, and Richard Socher. 2017. “Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 375–383.
Google Scholar
Ordonez, Vicente, Girish Kulkarni, and Tamara Berg. 2011. “Im2text: Describing Images Using 1 Million Captioned Photographs.” In Advances in Neural Information Processing Systems, 24.
Google Scholar
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. “Bleu: A Method for Automatic Evaluation of Machine Translation.” In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 311–318.
Google Scholar
Qu, Bo, Xuelong Li, Dacheng Tao, and Xiaoqiang Lu. 2016. “Deep Semantic Understanding of High Resolution Remote Sensing Image.” In 2016 International Conference on Computer, Information and Telecommunication Systems (CITS), 1–5. IEEE.
Google Scholar
Rouge, Lin C. Y. 2004. “A Package for Automatic Evaluation of Summaries.” In Proceedings of Workshop on Text Summarization of ACL, Spain, Vol. 5.
Google Scholar
Shi, Zhenwei, and Zhengxia Zou. 2017. “Can a Machine Generate Humanlike Language Descriptions for a Remote Sensing Image?.” IEEE Transactions on Geoscience and Remote Sensing 55 (6): 3623–3634. https://doi.org/10.1109/TGRS.2017.2677464.
Web of Science ®Google Scholar
Sumbul, Gencer, Sonali Nayak, and Begüm Demir. 2020. “SD-RSIC: Summarization-Driven Deep Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 59 (8): 6922–6934. https://doi.org/10.1109/TGRS.2020.3031111.
Web of Science ®Google Scholar
Ushiku, Yoshitaka, Masataka Yamaguchi, Yusuke Mukuta, and Tatsuya Harada. 2015. “Common Subspace for Model and Similarity: Phrase Learning for Caption Generation from Images.” In Proceedings of the IEEE International Conference on Computer Vision, 2668–2676.
Google Scholar
Vedantam, Ramakrishna, C. Lawrence Zitnick, and Devi Parikh. 2015. “Cider: Consensus-based Image Description Evaluation.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4566–4575.
Google Scholar
Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. “Show and Tell: A Neural Image Caption Generator.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3156–3164.
Google Scholar
Wang, Qi, Wei Huang, Xueting Zhang, and Xuelong Li. 2020. “Word–Sentence Framework for Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 59 (12): 10532–10543. https://doi.org/10.1109/TGRS.2020.3044054.
Web of Science ®Google Scholar
Wang, Binqiang, Xiaoqiang Lu, Xiangtao Zheng, and Xuelong Li. 2019. “Semantic Descriptions of High-Resolution Remote Sensing Images.” IEEE Geoscience and Remote Sensing Letters 16 (8): 1274–1278. https://doi.org/10.1109/LGRS.8859.
Web of Science ®Google Scholar
Wang, Shuang, Xiutiao Ye, Yu Gu, Jihui Wang, Yun Meng, Jingxian Tian, Biao Hou, and Licheng Jiao. 2022. “Multi-Label Semantic Feature Fusion for Remote Sensing Image Captioning.” ISPRS Journal of Photogrammetry and Remote Sensing 184:1–18. https://doi.org/10.1016/j.isprsjprs.2021.11.020.
Web of Science ®Google Scholar
Wang, Yong, Wenkai Zhang, Zhengyuan Zhang, Xin Gao, and Xian Sun. 2022. “Multiscale Multiinteraction Network for Remote Sensing Image Captioning.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15:2154–2165. https://doi.org/10.1109/JSTARS.2022.3153636.
Web of Science ®Google Scholar
Wang, Binqiang, Xiangtao Zheng, Bo Qu, and Xiaoqiang Lu. 2020. “Retrieval Topic Recurrent Memory Network for Remote Sensing Image Captioning.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13:256–270. https://doi.org/10.1109/JSTARS.4609443.
Web of Science ®Google Scholar
Xu, Kelvin, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.” In International Conference on Machine Learning, 2048–2057. PMLR.
Google Scholar
Yang, Yi, and Shawn Newsam. 2010. “Bag-of-Visual-Words and Spatial Extensions for Land-use Classification.” In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, 270–279.
Google Scholar
Yang, Qiaoqiao, Zihao Ni, and Peng Ren. 2022. “Meta Captioning: A Meta Learning Based Remote Sensing Image Captioning Framework.” ISPRS Journal of Photogrammetry and Remote Sensing186:190–200. https://doi.org/10.1016/j.isprsjprs.2022.02.001.
Web of Science ®Google Scholar
Yuan, Zhenghang, Xuelong Li, and Qi Wang. 2019. “Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning.” IEEE Access 8:2608–2620. https://doi.org/10.1109/Access.6287639.
Web of Science ®Google Scholar
Zhang, Zhengyuan, Wenhui Diao, Wenkai Zhang, Menglong Yan, Xin Gao, and Xian Sun. 2019. “LAM: Remote Sensing Image Captioning with Label-Attention Mechanism.” Remote Sensing 11 (20): 2349. https://doi.org/10.3390/rs11202349.
Web of Science ®Google Scholar
Zhang, Xueting, Qi Wang, Shangdong Chen, and Xuelong Li. 2019. “Multi-scale Cropping Mechanism for Remote Sensing Image Captioning.” In IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, 10039–10042. IEEE.
Google Scholar
Zhang, Xiangrong, Xin Wang, Xu Tang, Huiyu Zhou, and Chen Li. 2019. “Description Generation for Remote Sensing Images Using Attribute Attention Mechanism.” Remote Sensing 11 (6): 612. https://doi.org/10.3390/rs11060612.
Web of Science ®Google Scholar
Zhang, Lu, Jianming Zhang, Zhe Lin, Huchuan Lu, and You He. 2019. “Capsal: Leveraging Captioning to Boost Semantics for Salient Object Detection.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6024–6033.
Google Scholar
Zhang, Zhengyuan, Wenkai Zhang, Menglong Yan, Xin Gao, Kun Fu, and Xian Sun. 2021. “Global Visual Feature and Linguistic State Guided Attention for Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 60:1–16. https://doi.org/10.1109/TGRS.2020.3040221.
Web of Science ®Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

MC-Net: multi-scale contextual information aggregation network for image captioning on remote sensing images

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

MC-Net: multi-scale contextual information aggregation network for image captioning on remote sensing images

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date