459
Views
0
CrossRef citations to date
0
Altmetric
Research Article

MC-Net: multi-scale contextual information aggregation network for image captioning on remote sensing images

ORCID Icon, ORCID Icon, , ORCID Icon, , & show all
Pages 4848-4866 | Received 28 Aug 2023, Accepted 08 Nov 2023, Published online: 28 Nov 2023

References

  • Anderson, Peter, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. “Spice: Semantic Propositional Image Caption Evaluation.” In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part V 14, 382–398. Springer.
  • Anderson, Peter, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. “Bottom-up and Top-down Attention for Image Captioning and Visual Question Answering.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6077–6086.
  • Aneja, Jyoti, Aditya Deshpande, and Alexander G. Schwing. 2018. “Convolutional Image Captioning.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5561–5570.
  • Banerjee, Satanjeev, and Alon Lavie. 2005. “METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments.” In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 65–72.
  • Cheng, Qimin, Haiyan Huang, Yuan Xu, Yuzhuo Zhou, Huanying Li, and Zhongyuan Wang. 2022. “NWPU-Captions Dataset and MLCA-Net for Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 60:1–19.
  • Cheng, Qimin, Haiyan Huang, Lan Ye, Peng Fu, Deqiao Gan, and Yuzhuo Zhou. 2021. “A Semantic-Preserving Deep Hashing Model for Multi-Label Remote Sensing Image Retrieval.” Remote Sensing 13 (24): 4965. https://doi.org/10.3390/rs13244965.
  • Du, Shouji, Shihong Du, Bo Liu, and Xiuyuan Zhang. 2021. “Incorporating DeepLabv3+ and Object-based Image Analysis for Semantic Segmentation of Very High Resolution Remote Sensing Images.” International Journal of Digital Earth 14 (3): 357–378. https://doi.org/10.1080/17538947.2020.1831087.
  • Farhadi, Ali, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. “Every Picture Tells a Story: Generating Sentences from Images.” In Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV 11, 15–29. Springer.
  • Feng, Jiangfan, Panyu Chen, Zhujun Gu, Maimai Zeng, and Wei Zheng. 2023. “MDSNet: A Multiscale Decoupled Supervision Network for Semantic Segmentation of Remote Sensing Images.” International Journal of Digital Earth 16 (1): 2844–2861. https://doi.org/10.1080/17538947.2023.2241435.
  • Fu, Kun, Yang Li, Wenkai Zhang, Hongfeng Yu, and Xian Sun. 2020. “Boosting Memory with a Persistent Memory Mechanism for Remote Sensing Image Captioning.” Remote Sensing 12 (11): 1874. https://doi.org/10.3390/rs12111874.
  • Hoxha, Genc, and Farid Melgani. 2020. “Remote Sensing Image Captioning with SVM-based Decoding.” In IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium, 6734–6737. IEEE.
  • Hoxha, Genc, and Farid Melgani. 2022. “A Novel SVM-Based Decoder for Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 60:1–14.
  • Hoxha, Genc, Farid Melgani, and Begüm Demir. 2020. “Toward Remote Sensing Image Retrieval Under a Deep Image Captioning Perspective.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13:4462–4475. https://doi.org/10.1109/JSTARS.4609443.
  • Hoxha, Genc, Giacomo Scuccato, and Farid Melgani. 2023. “Improving Image Captioning Systems With Postprocessing Strategies.” IEEE Transactions on Geoscience and Remote Sensing 61:1–13. https://doi.org/10.1109/TGRS.2023.3281334.
  • Huang, Wei, Qi Wang, and Xuelong Li. 2020. “Denoising-Based Multiscale Feature Fusion for Remote Sensing Image Captioning.” IEEE Geoscience and Remote Sensing Letters 18 (3): 436–440. https://doi.org/10.1109/LGRS.8859.
  • Kulkarni, Girish, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and Tamara L. Berg. 2013. “Babytalk: Understanding and Generating Simple Image Descriptions.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (12): 2891–2903. https://doi.org/10.1109/TPAMI.2012.162.
  • Li, Yunpeng, Xiangrong Zhang, Jing Gu, Chen Li, Xin Wang, Xu Tang, and Licheng Jiao. 2021. “Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 60:1–16.
  • Li, Xuelong, Xueting Zhang, Wei Huang, and Qi Wang. 2020. “Truncation Cross Entropy Loss for Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 59 (6): 5246–5257. https://doi.org/10.1109/TGRS.2020.3010106.
  • Liu, Qingrong, Chengqing Ruan, Shan Zhong, Jian Li, Zhonghui Yin, and Xihu Lian. 2018. “Risk Assessment of Storm Surge Disaster Based on Numerical Models and Remote Sensing.” International Journal of Applied Earth Observation and Geoinformation 68:20–30. https://doi.org/10.1016/j.jag.2018.01.016.
  • Lu, Xiaoqiang, Binqiang Wang, and Xiangtao Zheng. 2019. “Sound Active Attention Framework for Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 58 (3): 1985–2000. https://doi.org/10.1109/TGRS.36.
  • Lu, Xiaoqiang, Binqiang Wang, Xiangtao Zheng, and Xuelong Li. 2017. “Exploring Models and Data for Remote Sensing Image Caption Generation.” IEEE Transactions on Geoscience and Remote Sensing 56 (4): 2183–2195. https://doi.org/10.1109/TGRS.2017.2776321.
  • Lu, Jiasen, Caiming Xiong, Devi Parikh, and Richard Socher. 2017. “Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 375–383.
  • Ordonez, Vicente, Girish Kulkarni, and Tamara Berg. 2011. “Im2text: Describing Images Using 1 Million Captioned Photographs.” In Advances in Neural Information Processing Systems, 24.
  • Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. “Bleu: A Method for Automatic Evaluation of Machine Translation.” In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 311–318.
  • Qu, Bo, Xuelong Li, Dacheng Tao, and Xiaoqiang Lu. 2016. “Deep Semantic Understanding of High Resolution Remote Sensing Image.” In 2016 International Conference on Computer, Information and Telecommunication Systems (CITS), 1–5. IEEE.
  • Rouge, Lin C. Y. 2004. “A Package for Automatic Evaluation of Summaries.” In Proceedings of Workshop on Text Summarization of ACL, Spain, Vol. 5.
  • Shi, Zhenwei, and Zhengxia Zou. 2017. “Can a Machine Generate Humanlike Language Descriptions for a Remote Sensing Image?.” IEEE Transactions on Geoscience and Remote Sensing 55 (6): 3623–3634. https://doi.org/10.1109/TGRS.2017.2677464.
  • Sumbul, Gencer, Sonali Nayak, and Begüm Demir. 2020. “SD-RSIC: Summarization-Driven Deep Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 59 (8): 6922–6934. https://doi.org/10.1109/TGRS.2020.3031111.
  • Ushiku, Yoshitaka, Masataka Yamaguchi, Yusuke Mukuta, and Tatsuya Harada. 2015. “Common Subspace for Model and Similarity: Phrase Learning for Caption Generation from Images.” In Proceedings of the IEEE International Conference on Computer Vision, 2668–2676.
  • Vedantam, Ramakrishna, C. Lawrence Zitnick, and Devi Parikh. 2015. “Cider: Consensus-based Image Description Evaluation.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4566–4575.
  • Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. “Show and Tell: A Neural Image Caption Generator.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3156–3164.
  • Wang, Qi, Wei Huang, Xueting Zhang, and Xuelong Li. 2020. “Word–Sentence Framework for Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 59 (12): 10532–10543. https://doi.org/10.1109/TGRS.2020.3044054.
  • Wang, Binqiang, Xiaoqiang Lu, Xiangtao Zheng, and Xuelong Li. 2019. “Semantic Descriptions of High-Resolution Remote Sensing Images.” IEEE Geoscience and Remote Sensing Letters 16 (8): 1274–1278. https://doi.org/10.1109/LGRS.8859.
  • Wang, Shuang, Xiutiao Ye, Yu Gu, Jihui Wang, Yun Meng, Jingxian Tian, Biao Hou, and Licheng Jiao. 2022. “Multi-Label Semantic Feature Fusion for Remote Sensing Image Captioning.” ISPRS Journal of Photogrammetry and Remote Sensing 184:1–18. https://doi.org/10.1016/j.isprsjprs.2021.11.020.
  • Wang, Yong, Wenkai Zhang, Zhengyuan Zhang, Xin Gao, and Xian Sun. 2022. “Multiscale Multiinteraction Network for Remote Sensing Image Captioning.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15:2154–2165. https://doi.org/10.1109/JSTARS.2022.3153636.
  • Wang, Binqiang, Xiangtao Zheng, Bo Qu, and Xiaoqiang Lu. 2020. “Retrieval Topic Recurrent Memory Network for Remote Sensing Image Captioning.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13:256–270. https://doi.org/10.1109/JSTARS.4609443.
  • Xu, Kelvin, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.” In International Conference on Machine Learning, 2048–2057. PMLR.
  • Yang, Yi, and Shawn Newsam. 2010. “Bag-of-Visual-Words and Spatial Extensions for Land-use Classification.” In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, 270–279.
  • Yang, Qiaoqiao, Zihao Ni, and Peng Ren. 2022. “Meta Captioning: A Meta Learning Based Remote Sensing Image Captioning Framework.” ISPRS Journal of Photogrammetry and Remote Sensing186:190–200. https://doi.org/10.1016/j.isprsjprs.2022.02.001.
  • Yuan, Zhenghang, Xuelong Li, and Qi Wang. 2019. “Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning.” IEEE Access 8:2608–2620. https://doi.org/10.1109/Access.6287639.
  • Zhang, Zhengyuan, Wenhui Diao, Wenkai Zhang, Menglong Yan, Xin Gao, and Xian Sun. 2019. “LAM: Remote Sensing Image Captioning with Label-Attention Mechanism.” Remote Sensing 11 (20): 2349. https://doi.org/10.3390/rs11202349.
  • Zhang, Xueting, Qi Wang, Shangdong Chen, and Xuelong Li. 2019. “Multi-scale Cropping Mechanism for Remote Sensing Image Captioning.” In IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, 10039–10042. IEEE.
  • Zhang, Xiangrong, Xin Wang, Xu Tang, Huiyu Zhou, and Chen Li. 2019. “Description Generation for Remote Sensing Images Using Attribute Attention Mechanism.” Remote Sensing 11 (6): 612. https://doi.org/10.3390/rs11060612.
  • Zhang, Lu, Jianming Zhang, Zhe Lin, Huchuan Lu, and You He. 2019. “Capsal: Leveraging Captioning to Boost Semantics for Salient Object Detection.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6024–6033.
  • Zhang, Zhengyuan, Wenkai Zhang, Menglong Yan, Xin Gao, Kun Fu, and Xian Sun. 2021. “Global Visual Feature and Linguistic State Guided Attention for Remote Sensing Image Captioning.” IEEE Transactions on Geoscience and Remote Sensing 60:1–16. https://doi.org/10.1109/TGRS.2020.3040221.