72
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Action-guided CycleGAN for Bi-directional Video Prediction

ORCID Icon, ORCID Icon & ORCID Icon

References

  • H. Sang, Z. Zhao, and D. He, “Two-level attention model based video action recognition network,” IEEE. Access., Vol. 7, pp. 118388–401, 2019.
  • S. P. Sahoo, and S. Ari, “On an algorithm for human action recognition,” Expert. Syst. Appl., Vol. 115, pp. 524–34, 2019. Available: http://www.sciencedirect.com/science/ article/pii/S0957417418305220.
  • L. Wang, Y. Xiong, Z. Wang, and Y. Qiao, “Towards good practices for very deep two-stream convnets,” CoRR, Vol. abs/1507.02159, 2015. Available: http://arxiv.org/abs/1507.02159.
  • Y. L. Chang, C. S. Chan, and P. Remagnino, “Action recognition on continuous video,” Neural Comput. Appl., Vol. 33, no. 4, pp. 1233–43, Jun. 2021. DOI:10.1007/s00521-020-04982-9.
  • J. Guo, H. Bai, Z. Tang, P. Xu, D. Gan, and B. Liu, “Multimodal human action recognition for video content matching,” Multimed. Tools. Appl., Vol. 79, no. 45-46, pp. 34665–83, 2020. DOI: 10.1007/s11042-020-08998-0.
  • A. De, I. Valera, N. Ganguly, S. Bhattacharya, and M. Gomez Rodriguez, “Learning and forecasting opinion dynamics in social networks,” in Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 2016, vol. 29, pp. 397–405. https://proceedings.neurips.cc/paper/2016/file/f340f1b1f65b6df5b5e3f94d95b11daf-Paper.pdf.
  • X. Liang, L. Lee, W. Dai, and E. P. Xing, “Dual motion gan for future-flow embedded video prediction,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1762–70.
  • S. Tulyakov, M. Liu, X. Yang, and J. Kautz, “Mocogan: Decomposing motion and content for video generation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 1526–35.
  • R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee, “Decomposing motion and content for natural video sequence prediction,” in International Conference on Learning Representations (ICLR), 2017.
  • M. Mathieu, C. Couprie, and Y. LeCun, “Deep multi-scale video prediction beyond mean square error,” in 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Y. Bengio, and Y. LeCun, Eds. 2016. http://arxiv.org/abs/1511.05440
  • N. Srivastava, E. Mansimov, and R. Salakhudinov, “Unsupervised learning of video representations using lstms,” in Proceedings of the 32nd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, F. Bach, and D. Blei, Eds. Lille, France: PMLR, 07–09 Jul 2015, vol. 37, pp. 843–52. http://proceedings.mlr.press/v37/srivastava15.html.
  • Z. Liu, R. A. Yeh, X. Tang, Y. Liu, and A. Agarwala, “Video frame synthesis using deep voxel flow,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 4473–81.
  • N. Sedaghat, M. Zolfaghari, and T. Brox, “Hybrid learning of optical flow and next frame prediction to boost optical flow in the wild,” arXiv:1612.03777, Tech. Rep., 2017. Available: https://arxiv.org/abs/1612.03777. http://lmb.informatik.uni-freiburg.de/Publications/ 2017/SZB17.
  • T. Xue, J. Wu, K. L. Bouman, and W. T. Freeman, “Visual dynamics: Stochastic future generation via layered cross convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., Vol. 41, no. 9, pp. 2236–50, 2019.
  • J. Walker, K. Marino, A. Gupta, and M. Hebert, “The pose knows Video forecasting by generating pose futures,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3352–61.
  • R. Villegas, J. Yang, Y. Zou, S. Sohn, X. Lin, and H. Lee, “Learning to generate long-term future via hierarchical prediction,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, pp. 3560–9.
  • T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4396–405.
  • J. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2242–51.
  • C. Chan, S. Ginosar, T. Zhou, and A. Efros, “Everybody dances now,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 5932–41.
  • A. Siarohin, S. Lathuili`ere, S. Tulyakov, E. Ricci, and N. Sebe, “Animating arbitrary objects via deep motion transfer,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2372–81.
  • L. Yu, W. Zhang, J. Wang, and Y. Yu, “Seqgan: Sequence generative adversarial nets with policy gradient,” in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, ser. AAAI’17, AAAI Press, 2017, pp. 2852–8.
  • A. v. d. Oord, et al. “Wavenet: A generative model for raw audio,” 2016, arxiv:1609.03499. Available: http://arxiv.org/abs/1609.03499.
  • T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional gans,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 8798–807.
  • Z. Yi, H. Zhang, P. Tan, and M. Gong, “Dualgan: Unsupervised dual learning for image-to-image translation,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2868–76.
  • L. Song, Z. Lu, R. He, Z. Sun, and T. Tan, “Geometry guided adversarial facial expression synthesis,” in Proceedings of the 26th ACM International Conference on Multimedia, ser. MM ‘18, New York, NY, USA: Association for Computing Machinery, 2018, pp. 627–35. DOI:10.1145/3240508.3240612
  • H. Tang, W. Wang, D. Xu, Y. Yan, and N. Sebe, “Gesturegan for hand gesture-to-gesture translation in the wild,” in Proceedings of the 26th ACM International Conference on Multimedia, ser. MM ‘18, New York, NY, USA: Association for Computing Machinery, 2018, pp. 774–82. DOI:10.1145/3240508.3240704
  • R. Wu, X. Gu, X. Tao, X. Shen, Y.-W. Tai, and J. Jia, “Landmark assisted cyclegan for cartoon face generation,” ArXiv, Vol. abs/1907.01424, 2019.
  • A. Siarohin, E. Sangineto, S. Lathuili`ere, and N. Sebe, “Deformable gans for pose-based human image generation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 3408–16.
  • H. Tang, D. Xu, G. Liu, W. Wang, N. Sebe, and Y. Yan, “Cycle in cycle generative adversarial networks for keypoint-guided image generation,” in Proceedings of the 27th ACM International Conference on Multimedia, ser. MM ‘19, New York, NY, USA: Association for Computing Machinery, 2019, pp. 2052–2060. DOI:10.1145/3343031.3350980
  • H. Chiu, E. Adeli, B. Wang, D. Huang, and J. C. Niebles, “Actionagnostic human pose forecasting,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 1423–32.
  • P. Luc, N. Neverova, C. Couprie, J. Verbeek, and Y. LeCun, “Predicting deeper into the future of semantic segmentation,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 648–57.
  • A. Terwilliger, G. Brazil, and X. Liu, “Recurrent flow-guided semantic forecasting,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 1703–12.
  • X. Chen, W. Wang, J. Wang, and W. Li, “Learning object-centric transformation for video prediction,” in Proceedings of the 25th ACM International Conference on Multimedia, ser. MM ‘17, New York, NY, USA: Association for Computing Machinery, 2017, pp. 1503–12. DOI:10.1145/3123266.3123349
  • N. Wichers, R. Villegas, D. Erhan, and H. Lee, “Hierarchical long-term video prediction without supervision,” in ICML, 2018.
  • J. Zhang, Y. Wang, M. Long, W. Jianmin, and P. S. Yu, “Z-order recurrent neural networks for video prediction,” in 2019 IEEE International Conference on Multimedia and Expo (ICME), 2019, pp. 230–5.
  • M. Li, C. Yuan, Z. Lin, Z. Zheng, and Y. Cheng, “Stochastic video generation with disentangled representations,” in 2019 IEEE International Conference on Multimedia and Expo (ICME), 2019, pp. 224–9.
  • S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural Comput., Vol. 9, no. 8, pp. 1735–80, Nov. 1997. DOI: 10.1162/neco.1997.9.8.1735.
  • K. Cho, B. van Merrienboer, C G¨ulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, A. Moschitti, B. Pang, and W. Daelemans, Eds. ACL, 2014, pp. 1724–34. Doi:10.3115/v1/d14-1179
  • X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-k. Wong, and W.-c. Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” in Proceedings of the 28th International Conference on Neural Information Processing Systems, Volume 1, ser. NIPS’15, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, R. Garnett, Eds. Cambridge, MA, USA: MIT Press, 2015, pp. 802–10.
  • M. Ranzato, A. Szlam, J. Bruna, M. Mathieu, R. Collobert, and S. Chopra, “Video (language) modelling: a baseline for generative models of natural videos,” CoRR, Vol. abs/1412.6604, 2014. Available: http://arxiv.org/abs/1412.6604.
  • C. Vondrick, H. Pirsiavash, and A. Torralba, “Anticipating visual representations from unlabeled video,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, IEEE Computer Society, 2016, pp. 98–106. DOI:10.1109/CVPR.2016.18
  • K. M. Kitani, D.-A. Huang, and W.-C. Ma, “Chapter 12 - activity forecasting: An invitation to predictive perception,” in Group and Crowd Behavior for Computer Vision, V. Murino, M. Cristani, S. Shah, and S. Savarese, Eds. Academic Press, 2017, pp. 273–94. https://www.sciencedirect.com/science/article/pii/B978012809276700014X.
  • D.-A. Huang, and K. M. Kitani, “Action-reaction: Forecasting the dynamics of human interaction,” in Computer Vision – ECCV 2014, Springer International Publishing, 2014, pp. 489–504. DOI:10.1007/978-3-319-10584-032
  • S. L. Pintea, J. C. van Gemert, and A. W. M. Smeulders, “D´ej`a vu:,” in Computer Vision – ECCV 2014, Springer International Publishing, 2014, pp. 172–87. DOI:10.1007/978-3-319-10578-912
  • J. Yuen, and A. Torralba, “A data-driven approach for event prediction,” in Computer Vision – ECCV 2010, Springer Berlin Heidelberg, 2010, pp. 707–20. DOI:10.1007/978-3-642-15552-951
  • J. B¨utepage, M. J. Black, D. Kragic, and H. Kjellstr¨om, “Deep representation learning for human motion prediction and classification,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1591–9.
  • A. De, I. Valera, N. Ganguly, S. Bhattacharya, and M. Gomez Rodriguez, “Learning and forecasting opinion dynamics in social networks,” in Advances in Neural Information Processing Systems, 29, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 2016, pp. 397–405. Available: https://proceedings.neurips.cc/paper/2016/file/f340f1b1f65b6df5b5e3f94d95b11daf-Paper.pdf.
  • I. Goodfellow, et al., “Generative adversarial nets,” in Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, vol. 27, pp. 2672–80. Available: https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf.
  • M. Mirza, and S. Osindero. “Conditional generative adversarial nets,” 2014, cite arxiv:1411.1784. Available: http://arxiv.org/abs/ 1411.1784.
  • T. Miyato, and M. Koyama, “cgans with projection discriminator,” ArXiv, Vol. abs/1802.05637, 2018.
  • A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier gans,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, pp. 2642–51.
  • E. Barsoum, J. Kender, and Z. Liu, “Hp-gan: Probabilistic 3d human motion prediction via gan,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018, pp. 1499–149909.
  • I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, ser. NIPS’14, Cambridge, MA, USA: MIT Press, 2014, pp. 3104–12.
  • X. Liang, L. Lee, W. Dai, and E. P. Xing, “Dual motion gan for future-flow embedded video prediction,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1762–70.
  • Z. Cao, G. Hidalgo, T. Simon, S. E. Wei, and Y. Sheikh, “Openpose: Realtime multi-person 2d pose estimation using part affinity fields,” IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, no. 1, pp. 172–86, 2021.
  • Y. Chen, C. Shen, X. Wei, L. Liu, and J. Yang, “Adversarial posenet: A structure-aware convolutional network for human pose estimation,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1221–30.
  • Y. Cao, O. Can´evet, and J. Odobez, “Leveraging convolutional pose machines for fast and accurate head pose estimation,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 1089–94.
  • N. K. Yadav, S. K. Singh, and S. R. Dubey, “CSA-GAN: Cyclic synthesized attention guided generative adversarial network for face synthesis,” Appl. Intel., Vol. 52, no. 11, pp. 12704–23, Feb. 2022. DOI:10.1007/s10489-021-03064-0.
  • N. K. Yadav, S. K. Singh, and S. R. Dubey, “TVA-GAN: attention guided generative adversarial network for thermal to visible image transformations,” Neural Comput. Appl., Vol. 35, no. 27, pp. 19729–49, Jul. 2023. DOI:10.1007/s00521-023-08724-5.
  • N. K. Yadav, S. K. Singh, and S. R. Dubey, “MobileAR-GAN: Mobilenet-based efficient attentive recurrent generative adversarial network for infrared-to-visual transformations,” IEEE Trans. Instrum. Meas., Vol. 71, pp. 1–9, 2022. DOI:10.1109/TIM.2022.3166202.
  • L. Ma, X. Jia, Q. Sun, B. Schiele, T. Tuytelaars, and L. Van Gool, “Pose guided person image generation,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, vol. 30, pp. 406–16. Available: https://proceedings.neurips.cc/paper/2017/file/34ed066df378efacc9b924ec161e7639-Paper.pdf.
  • W. Lotter, G. Kreiman, and D. Cox, “Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning,” in ICLR 2017, pp. 1–18.
  • Y. H. Kwon, and M. G. Park, “Predicting Future Frames Using Retrospective Cycle GAN,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1811–20. DOI:10.1109/CVPR.2019.00191.
  • Z. Zhang, L. Yang, and Y. Zheng, “Translating and segmenting multimodal medical volumes with cycle- and shape-consistency generative adversarial network,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 9242–51.
  • J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Computer Vision – ECCV 2016, Springer International Publishing, 2016, pp. 694–711. DOI:10.1007/978-3-319-46475-643
  • P. Isola, J. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5967–76.
  • M. S. Ryoo, and J. K. Aggarwal. “UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA),” Available: http://cvrc.ece.utexas.edu/SDHA2010/HumanInteracti on.html, 2010.
  • Y. Lin, L. Zheng, Z. Zheng, Y. Wu, Z. Hu, C. Yan, and Y. Yang, “Improving person re-identification by attribute and identity learning,” Pattern Recognit., 2019.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.