Search in:

Advanced search

IETE Technical Review Latest Articles

Submit an article Journal homepage

Views

CrossRef citations to date

Altmetric

Research Article

Action-guided CycleGAN for Bi-directional Video Prediction

Amit VermaElectronics and Communication Department, National Institute of Technology, Raipur, G. E. Raipur, Chhattisgarh, 492010, India

https://orcid.org/0000-0002-3210-5144 View further author information

Toshanlal MeenpalElectronics and Communication Department, National Institute of Technology, Raipur, G. E. Raipur, Chhattisgarh, 492010, India

https://orcid.org/0000-0003-2809-075X View further author information

Bibhudendra AcharyaElectronics and Communication Department, National Institute of Technology, Raipur, G. E. Raipur, Chhattisgarh, 492010, India

https://orcid.org/0000-0001-7233-7591 View further author information

Published online: 17 Mar 2024

Cite this article
https://doi.org/10.1080/02564602.2024.2327566
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

H. Sang, Z. Zhao, and D. He, “Two-level attention model based video action recognition network,” IEEE. Access., Vol. 7, pp. 118388–401, 2019.
Google Scholar
S. P. Sahoo, and S. Ari, “On an algorithm for human action recognition,” Expert. Syst. Appl., Vol. 115, pp. 524–34, 2019. Available: http://www.sciencedirect.com/science/ article/pii/S0957417418305220.
Web of Science ®Google Scholar
L. Wang, Y. Xiong, Z. Wang, and Y. Qiao, “Towards good practices for very deep two-stream convnets,” CoRR, Vol. abs/1507.02159, 2015. Available: http://arxiv.org/abs/1507.02159.
Google Scholar
Y. L. Chang, C. S. Chan, and P. Remagnino, “Action recognition on continuous video,” Neural Comput. Appl., Vol. 33, no. 4, pp. 1233–43, Jun. 2021. DOI:10.1007/s00521-020-04982-9.
Web of Science ®Google Scholar
J. Guo, H. Bai, Z. Tang, P. Xu, D. Gan, and B. Liu, “Multimodal human action recognition for video content matching,” Multimed. Tools. Appl., Vol. 79, no. 45-46, pp. 34665–83, 2020. DOI: 10.1007/s11042-020-08998-0.
Web of Science ®Google Scholar
A. De, I. Valera, N. Ganguly, S. Bhattacharya, and M. Gomez Rodriguez, “Learning and forecasting opinion dynamics in social networks,” in Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 2016, vol. 29, pp. 397–405. https://proceedings.neurips.cc/paper/2016/file/f340f1b1f65b6df5b5e3f94d95b11daf-Paper.pdf.
Google Scholar
X. Liang, L. Lee, W. Dai, and E. P. Xing, “Dual motion gan for future-flow embedded video prediction,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1762–70.
Google Scholar
S. Tulyakov, M. Liu, X. Yang, and J. Kautz, “Mocogan: Decomposing motion and content for video generation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 1526–35.
Google Scholar
R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee, “Decomposing motion and content for natural video sequence prediction,” in International Conference on Learning Representations (ICLR), 2017.
Google Scholar
M. Mathieu, C. Couprie, and Y. LeCun, “Deep multi-scale video prediction beyond mean square error,” in 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Y. Bengio, and Y. LeCun, Eds. 2016. http://arxiv.org/abs/1511.05440
Google Scholar
N. Srivastava, E. Mansimov, and R. Salakhudinov, “Unsupervised learning of video representations using lstms,” in Proceedings of the 32nd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, F. Bach, and D. Blei, Eds. Lille, France: PMLR, 07–09 Jul 2015, vol. 37, pp. 843–52. http://proceedings.mlr.press/v37/srivastava15.html.
Google Scholar
Z. Liu, R. A. Yeh, X. Tang, Y. Liu, and A. Agarwala, “Video frame synthesis using deep voxel flow,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 4473–81.
Google Scholar
N. Sedaghat, M. Zolfaghari, and T. Brox, “Hybrid learning of optical flow and next frame prediction to boost optical flow in the wild,” arXiv:1612.03777, Tech. Rep., 2017. Available: https://arxiv.org/abs/1612.03777. http://lmb.informatik.uni-freiburg.de/Publications/ 2017/SZB17.
Google Scholar
T. Xue, J. Wu, K. L. Bouman, and W. T. Freeman, “Visual dynamics: Stochastic future generation via layered cross convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., Vol. 41, no. 9, pp. 2236–50, 2019.
PubMed Web of Science ®Google Scholar
J. Walker, K. Marino, A. Gupta, and M. Hebert, “The pose knows Video forecasting by generating pose futures,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3352–61.
Google Scholar
R. Villegas, J. Yang, Y. Zou, S. Sohn, X. Lin, and H. Lee, “Learning to generate long-term future via hierarchical prediction,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, pp. 3560–9.
Google Scholar
T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4396–405.
Google Scholar
J. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2242–51.
Google Scholar
C. Chan, S. Ginosar, T. Zhou, and A. Efros, “Everybody dances now,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 5932–41.
Google Scholar
A. Siarohin, S. Lathuili`ere, S. Tulyakov, E. Ricci, and N. Sebe, “Animating arbitrary objects via deep motion transfer,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2372–81.
Google Scholar
L. Yu, W. Zhang, J. Wang, and Y. Yu, “Seqgan: Sequence generative adversarial nets with policy gradient,” in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, ser. AAAI’17, AAAI Press, 2017, pp. 2852–8.
Google Scholar
A. v. d. Oord, et al. “Wavenet: A generative model for raw audio,” 2016, arxiv:1609.03499. Available: http://arxiv.org/abs/1609.03499.
Google Scholar
T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional gans,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 8798–807.
Google Scholar
Z. Yi, H. Zhang, P. Tan, and M. Gong, “Dualgan: Unsupervised dual learning for image-to-image translation,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2868–76.
Google Scholar
L. Song, Z. Lu, R. He, Z. Sun, and T. Tan, “Geometry guided adversarial facial expression synthesis,” in Proceedings of the 26th ACM International Conference on Multimedia, ser. MM ‘18, New York, NY, USA: Association for Computing Machinery, 2018, pp. 627–35. DOI:10.1145/3240508.3240612
Google Scholar
H. Tang, W. Wang, D. Xu, Y. Yan, and N. Sebe, “Gesturegan for hand gesture-to-gesture translation in the wild,” in Proceedings of the 26th ACM International Conference on Multimedia, ser. MM ‘18, New York, NY, USA: Association for Computing Machinery, 2018, pp. 774–82. DOI:10.1145/3240508.3240704
Google Scholar
R. Wu, X. Gu, X. Tao, X. Shen, Y.-W. Tai, and J. Jia, “Landmark assisted cyclegan for cartoon face generation,” ArXiv, Vol. abs/1907.01424, 2019.
Google Scholar
A. Siarohin, E. Sangineto, S. Lathuili`ere, and N. Sebe, “Deformable gans for pose-based human image generation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 3408–16.
Google Scholar
H. Tang, D. Xu, G. Liu, W. Wang, N. Sebe, and Y. Yan, “Cycle in cycle generative adversarial networks for keypoint-guided image generation,” in Proceedings of the 27th ACM International Conference on Multimedia, ser. MM ‘19, New York, NY, USA: Association for Computing Machinery, 2019, pp. 2052–2060. DOI:10.1145/3343031.3350980
Google Scholar
H. Chiu, E. Adeli, B. Wang, D. Huang, and J. C. Niebles, “Actionagnostic human pose forecasting,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 1423–32.
Google Scholar
P. Luc, N. Neverova, C. Couprie, J. Verbeek, and Y. LeCun, “Predicting deeper into the future of semantic segmentation,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 648–57.
Google Scholar
A. Terwilliger, G. Brazil, and X. Liu, “Recurrent flow-guided semantic forecasting,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 1703–12.
Google Scholar
X. Chen, W. Wang, J. Wang, and W. Li, “Learning object-centric transformation for video prediction,” in Proceedings of the 25th ACM International Conference on Multimedia, ser. MM ‘17, New York, NY, USA: Association for Computing Machinery, 2017, pp. 1503–12. DOI:10.1145/3123266.3123349
Google Scholar
N. Wichers, R. Villegas, D. Erhan, and H. Lee, “Hierarchical long-term video prediction without supervision,” in ICML, 2018.
Google Scholar
J. Zhang, Y. Wang, M. Long, W. Jianmin, and P. S. Yu, “Z-order recurrent neural networks for video prediction,” in 2019 IEEE International Conference on Multimedia and Expo (ICME), 2019, pp. 230–5.
Google Scholar
M. Li, C. Yuan, Z. Lin, Z. Zheng, and Y. Cheng, “Stochastic video generation with disentangled representations,” in 2019 IEEE International Conference on Multimedia and Expo (ICME), 2019, pp. 224–9.
Google Scholar
S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural Comput., Vol. 9, no. 8, pp. 1735–80, Nov. 1997. DOI: 10.1162/neco.1997.9.8.1735.
PubMed Web of Science ®Google Scholar
K. Cho, B. van Merrienboer, C G¨ulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, A. Moschitti, B. Pang, and W. Daelemans, Eds. ACL, 2014, pp. 1724–34. Doi:10.3115/v1/d14-1179
Google Scholar
X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-k. Wong, and W.-c. Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” in Proceedings of the 28th International Conference on Neural Information Processing Systems, Volume 1, ser. NIPS’15, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, R. Garnett, Eds. Cambridge, MA, USA: MIT Press, 2015, pp. 802–10.
Google Scholar
M. Ranzato, A. Szlam, J. Bruna, M. Mathieu, R. Collobert, and S. Chopra, “Video (language) modelling: a baseline for generative models of natural videos,” CoRR, Vol. abs/1412.6604, 2014. Available: http://arxiv.org/abs/1412.6604.
Google Scholar
C. Vondrick, H. Pirsiavash, and A. Torralba, “Anticipating visual representations from unlabeled video,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, IEEE Computer Society, 2016, pp. 98–106. DOI:10.1109/CVPR.2016.18
Google Scholar
K. M. Kitani, D.-A. Huang, and W.-C. Ma, “Chapter 12 - activity forecasting: An invitation to predictive perception,” in Group and Crowd Behavior for Computer Vision, V. Murino, M. Cristani, S. Shah, and S. Savarese, Eds. Academic Press, 2017, pp. 273–94. https://www.sciencedirect.com/science/article/pii/B978012809276700014X.
Google Scholar
D.-A. Huang, and K. M. Kitani, “Action-reaction: Forecasting the dynamics of human interaction,” in Computer Vision – ECCV 2014, Springer International Publishing, 2014, pp. 489–504. DOI:10.1007/978-3-319-10584-032
Google Scholar
S. L. Pintea, J. C. van Gemert, and A. W. M. Smeulders, “D´ej`a vu:,” in Computer Vision – ECCV 2014, Springer International Publishing, 2014, pp. 172–87. DOI:10.1007/978-3-319-10578-912
Google Scholar
J. Yuen, and A. Torralba, “A data-driven approach for event prediction,” in Computer Vision – ECCV 2010, Springer Berlin Heidelberg, 2010, pp. 707–20. DOI:10.1007/978-3-642-15552-951
Google Scholar
J. B¨utepage, M. J. Black, D. Kragic, and H. Kjellstr¨om, “Deep representation learning for human motion prediction and classification,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1591–9.
Google Scholar
A. De, I. Valera, N. Ganguly, S. Bhattacharya, and M. Gomez Rodriguez, “Learning and forecasting opinion dynamics in social networks,” in Advances in Neural Information Processing Systems, 29, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 2016, pp. 397–405. Available: https://proceedings.neurips.cc/paper/2016/file/f340f1b1f65b6df5b5e3f94d95b11daf-Paper.pdf.
Google Scholar
I. Goodfellow, et al., “Generative adversarial nets,” in Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, vol. 27, pp. 2672–80. Available: https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf.
Google Scholar
M. Mirza, and S. Osindero. “Conditional generative adversarial nets,” 2014, cite arxiv:1411.1784. Available: http://arxiv.org/abs/ 1411.1784.
Google Scholar
T. Miyato, and M. Koyama, “cgans with projection discriminator,” ArXiv, Vol. abs/1802.05637, 2018.
Google Scholar
A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier gans,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, pp. 2642–51.
Google Scholar
E. Barsoum, J. Kender, and Z. Liu, “Hp-gan: Probabilistic 3d human motion prediction via gan,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018, pp. 1499–149909.
Google Scholar
I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, ser. NIPS’14, Cambridge, MA, USA: MIT Press, 2014, pp. 3104–12.
Google Scholar
X. Liang, L. Lee, W. Dai, and E. P. Xing, “Dual motion gan for future-flow embedded video prediction,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1762–70.
Google Scholar
Z. Cao, G. Hidalgo, T. Simon, S. E. Wei, and Y. Sheikh, “Openpose: Realtime multi-person 2d pose estimation using part affinity fields,” IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, no. 1, pp. 172–86, 2021.
PubMed Web of Science ®Google Scholar
Y. Chen, C. Shen, X. Wei, L. Liu, and J. Yang, “Adversarial posenet: A structure-aware convolutional network for human pose estimation,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1221–30.
Google Scholar
Y. Cao, O. Can´evet, and J. Odobez, “Leveraging convolutional pose machines for fast and accurate head pose estimation,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 1089–94.
Google Scholar
N. K. Yadav, S. K. Singh, and S. R. Dubey, “CSA-GAN: Cyclic synthesized attention guided generative adversarial network for face synthesis,” Appl. Intel., Vol. 52, no. 11, pp. 12704–23, Feb. 2022. DOI:10.1007/s10489-021-03064-0.
Web of Science ®Google Scholar
N. K. Yadav, S. K. Singh, and S. R. Dubey, “TVA-GAN: attention guided generative adversarial network for thermal to visible image transformations,” Neural Comput. Appl., Vol. 35, no. 27, pp. 19729–49, Jul. 2023. DOI:10.1007/s00521-023-08724-5.
Web of Science ®Google Scholar
N. K. Yadav, S. K. Singh, and S. R. Dubey, “MobileAR-GAN: Mobilenet-based efficient attentive recurrent generative adversarial network for infrared-to-visual transformations,” IEEE Trans. Instrum. Meas., Vol. 71, pp. 1–9, 2022. DOI:10.1109/TIM.2022.3166202.
Web of Science ®Google Scholar
L. Ma, X. Jia, Q. Sun, B. Schiele, T. Tuytelaars, and L. Van Gool, “Pose guided person image generation,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, vol. 30, pp. 406–16. Available: https://proceedings.neurips.cc/paper/2017/file/34ed066df378efacc9b924ec161e7639-Paper.pdf.
Google Scholar
W. Lotter, G. Kreiman, and D. Cox, “Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning,” in ICLR 2017, pp. 1–18.
Google Scholar
Y. H. Kwon, and M. G. Park, “Predicting Future Frames Using Retrospective Cycle GAN,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1811–20. DOI:10.1109/CVPR.2019.00191.
Google Scholar
Z. Zhang, L. Yang, and Y. Zheng, “Translating and segmenting multimodal medical volumes with cycle- and shape-consistency generative adversarial network,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 9242–51.
Google Scholar
J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Computer Vision – ECCV 2016, Springer International Publishing, 2016, pp. 694–711. DOI:10.1007/978-3-319-46475-643
Google Scholar
P. Isola, J. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5967–76.
Google Scholar
M. S. Ryoo, and J. K. Aggarwal. “UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA),” Available: http://cvrc.ece.utexas.edu/SDHA2010/HumanInteracti on.html, 2010.
Google Scholar
Y. Lin, L. Zheng, Z. Zheng, Y. Wu, Z. Hu, C. Yan, and Y. Yang, “Improving person re-identification by attribute and identity learning,” Pattern Recognit., 2019.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Action-guided CycleGAN for Bi-directional Video Prediction

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Action-guided CycleGAN for Bi-directional Video Prediction

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date