740
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Reward estimation with scheduled knowledge distillation for dialogue policy learning

, &
Article: 2174078 | Received 09 Oct 2022, Accepted 24 Jan 2023, Published online: 07 Feb 2023

References

  • Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 41–48).
  • Budzianowski, P., Wen, T. H., Tseng, B. H., Casanueva, I., Ultes, S., Ramadan, O., & Gasic, M. (2018). MultiWOZ-A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 5016–5026).
  • Chen, H., Liu, X., Yin, D., & Tang, J. (2017). A survey on dialogue systems: Recent advances and new frontiers. Acm Sigkdd Explorations Newsletter, 19(2), 25–35. https://doi.org/10.1145/3166054.3166058
  • Chu, G., Wang, X., Shi, C., & Jiang, X. (2021). CuCo: Graph Representation with Curriculum Contrastive Learning. In IJCAI (pp. 2300–2306).
  • Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014, December). Empirical evaluation of gated recurrent neural networks on sequence modeling. In Nips Workshop on Deep Learning.
  • Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A. A. (2018). Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 35(1), 53–65. https://doi.org/10.1109/MSP.2017.2765202
  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  • Dhingra, B., Li, L., Li, X., Gao, J., Chen, Y. N., Ahmad, F., & Deng, L. (2017). Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Vol. 1, Long papers) (pp. 484–495).
  • Diakoloukas, V., Lygerakis, F., Lagoudakis, M. G., & Kotti, M. (2020). Variational denoising autoencoders and least-Squares policy iteration for statistical dialogue managers. IEEE Signal Processing Letters, 27, 960–964. https://doi.org/10.1109/LSP.97
  • Dong, X., Long, C., Xu, W., & Xiao, C. (2021). Dual graph convolutional networks with transformer and curriculum learning for image captioning. In Proceedings of the 29th ACM International Conference on Multimedia (pp. 2615–2624).
  • El-Bouri, R., Eyre, D., Watkinson, P., Zhu, T., & Clifton, D. (2020). Student-teacher curriculum learning via reinforcement learning: predicting hospital inpatient admission location. In International Conference on Machine Learning (pp. 2848–2857).
  • Geishauser, C., van Niekerk, C., Lin, H. C., Lubis, N., Heck, M., Feng, S., & Gasic, M. (2022). Dynamic Dialogue Policy for Continual Reinforcement Learning. In Proceedings of the 29th International Conference on Computational Linguistics (pp. 266–284).
  • Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2021). Knowledge distillation: A survey. International Journal of Computer Vision, 129(6), 1789–1819. https://doi.org/10.1007/s11263-021-01453-z
  • Greco, A., Saggese, A., Vento, M., & Vigilante, V. (2021). Effective training of convolutional neural networks for age estimation based on knowledge distillation. Neural Computing and Applications, 34(24), 1–16.
  • Guo, S., Huang, W., Zhang, H., Zhuang, C., Dong, D., Scott, M. R., & Huang, D. (2018). Curriculumnet: Weakly supervised learning from large-scale web images. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 135–150).
  • Haidar, M., & Rezagholizadeh, M. (2019). Textkd-gan: Text generation using knowledge distillation and generative adversarial networks. In Canadian Conference on Artificial Intelligence (pp. 107–118).
  • Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2 (7).
  • Hosseini-Asl, E., McCann, B., Wu, C. S., Yavuz, S., & Socher, R. (2020). A simple language model for task-oriented dialogue. Advances in Neural Information Processing Systems, 33, 20179–20191.
  • Jeon, H., & Lee, G. G. (2022). DORA: Towards policy optimization for task-oriented dialogue system with efficient context. Computer Speech & Language, 72, 101310. https://doi.org/10.1016/j.csl.2021.101310
  • Khandelwal, A. (2021). WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation for Multi-turn Dialogue. In Proceedings of the 14th International Conference on Natural Language Generation (pp. 64–75).
  • Li, X., Chen, Y. N., Li, L., Gao, J., & Celikyilmaz, A. (2017). End-to-End Task-Completion Neural Dialogue Systems. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Vol. 1, Long papers) (pp. 733–743).
  • Li, Z., Kiseleva, J., & de Rijke, M. (2020). Rethinking supervised learning and reinforcement learning in task-oriented dialogue systems. arXiv preprint arXiv:2009.09781.
  • Li, X., Lipton, Z. C., Dhingra, B., Li, L., Gao, J., & Chen, Y. N. (2016). A user simulator for task-completion dialogues. arXiv preprint arXiv:1612.05688.
  • Li, B., Wang, Z., Liu, H., Du, Q., Xiao, T., Zhang, C., & Zhu, J. (2021). Learning Light-Weight Translation Models from Deep Transformer. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, pp. 13217–13225).
  • Li, X., Wang, Y., Sun, S., Panda, S., Liu, J., & Gao, J. (2018). Microsoft dialogue challenge: Building end-to-end task-completion dialogue systems. arXiv preprint arXiv:1807.11125.
  • Lipton, Z. C., Gao, J., Li, L., Li, X., Ahmed, F., & Deng, L. (2016). Efficient exploration for dialogue policy learning with bbq networks & replay buffer spiking. arXiv preprint arXiv:1608.05081 3.
  • Liu, J., Chen, Y., & Liu, K. (2019). Exploiting the ground-truth: An adversarial imitation based knowledge distillation approach for event detection. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 6754–6761).
  • Liu, S., Zhang, J., He, K., Xu, W., & Zhou, J. (2021). Scheduled Dialog Policy Learning: An Automatic Curriculum Learning Framework for Task-oriented Dialog System. In Findings of the Association for Computational Linguistics: Acl-ijcnlp (pp. 1091–1102).
  • Lu, K., Zhang, S., & Chen, X. (2019). Goal-oriented dialogue policy learning from failures. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 2596–2603).
  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G, & Petersen, S. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236
  • Narvekar, S., & Stone, P. (2019). Learning Curriculum Policies for Reinforcement Learning. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (pp. 25–33).
  • Peng, B., Li, X., Gao, J., Liu, J., Chen, Y. N., & Wong, K. F. (2018). Adversarial advantage actor-critic model for task-completion dialogue policy learning. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6149–6153).
  • Peng, B., Li, X., Gao, J., Liu, J., & Wong, K. F. (2018). Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Vol. 1., Long papers) (pp. 2182–2192).
  • Platanios, E. A., Stretcu, O., Neubig, G., Póczos, B., & Mitchell, T. (2019). Competence-based Curriculum Learning for Neural Machine Translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human language Technologies (Vol. 1., long and short papers) (pp. 1162–1172).
  • Qu, M., Tang, J., & Han, J. (2018). Curriculum learning for heterogeneous star network embedding via deep reinforcement learning. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (pp. 468–476).
  • Ren, S., Guo, K., Ma, J., Zhu, F., Hu, B., & Zhou, H. (2021). Realistic medical image super-resolution with pyramidal feature multi-distillation networks for intelligent healthcare systems. Neural Computing and Applications, 1–16. https://doi.org/10.1007/s00521-021-06287-x
  • Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2014). Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550.
  • SCHATZMANN, J., & YOUNG, S. (2009). The hidden agenda user simulation model. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 733–747. https://doi.org/10.1109/TASL.2008.2012071
  • Shen, Y., Xu, X., & Cao, J. (2020). Reconciling predictive and interpretable performance in repeat buyer prediction via model distillation and heterogeneous classifiers fusion. Neural Computing and Applications, 32(13), 9495–9508. https://doi.org/10.1007/s00521-019-04462-9
  • Su, P. H. , Gašić, M., & Young, S. (2018). Reward estimation for dialogue policy optimisation. Computer Speech & Language, 51, 24–43. https://doi.org/10.1016/j.csl.2018.02.003
  • Su, S. Y., Li, X. , Gao, J. , Liu, J., & Chen, Y. N. (2018). Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning. In EMNLP.
  • Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine Learning Proceedings (pp. 216–224). Elsevier.
  • Tian, Z., Bi, W., Lee, D., Xue, L., Song, Y., Liu, X., & Zhang, N. L. (2020). Response-Anticipated Memory for On-Demand Knowledge Integration in Response Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 650–659).
  • Tian, C., Yin, W., & Moens, M. F. (2022). Anti-Overestimation Dialogue Policy Learning for Task-Completion Dialogue System. In Findings of the Association for Computational Linguistics: Naacl (pp. 565–577).
  • Wang, X., Chen, Y., & Zhu, W. (2021). A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence., 44(9), 4555–4576.
  • Wang, X., Fu, T., Liao, S., Wang, S., Lei, Z., & Mei, T. (2020). Exclusivity-consistency regularized knowledge distillation for face recognition. In European Conference on Computer Vision (pp. 325–342).
  • Wang, Y., Wang, W., Liang, Y., Cai, Y., & Hooi, B. (2021). Curgraph: Curriculum learning for graph classification. In Proceedings of the Web Conference (pp. 1238–1248).
  • Wu, G., Fang, W., Wang, J., Cao, J., Bao, W., Ping, Y., & Wang, Z. (2021). Gaussian Process based Deep Dyna-Q approach for Dialogue Policy Learning. In Findings of the Association for Computational Linguistics: Acl-ijcnlp (pp. 1786–1795).
  • Wu, W., Guo, Z., Zhou, X., Wu, H., Zhang, X., Lian, R., & Wang, H. (2019). Proactive Human-Machine Conversation with Explicit Conversation Goal. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3794–3804).
  • Zhang, R., Wang, Z., Zheng, M., Zhao, Y., & Huang, Z. (2021). Emotion-sensitive deep dyna-Q learning for task-completion dialogue policy learning. Neurocomputing, 459, 122–130. https://doi.org/10.1016/j.neucom.2021.06.075
  • Zhao, Y., Qin, H., Zhenyu, W., Zhu, C., & Wang, S. (2022). A Versatile Adaptive Curriculum Learning Framework for Task-oriented Dialogue Policy Learning. In Findings of the Association for Computational Linguistics: Naacl 2022 (pp. 711–723).
  • Zhao, Y., Wang, Z., & Huang, Z. (2021). Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning.
  • Zhao, Y., Wang, Z., Yin, K., Zhang, R., Huang, Z., & Wang, P. (2020). Dynamic Reward-Based Dueling Deep Dyna-Q: Robust Policy Learning in Noisy Environments. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, pp. 9676–9684).
  • Zhao, Y., Wang, Z., Zhu, C., Wang, S., Moens, M. F., & Huang, X. (2021). Efficient dialogue complementary policy learning via deep q-network policy and episodic memory policy. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 4311–4323).
  • Zhou, X., Zhu, F., & Zhao, P. (2022). Predicting before acting: improving policy quality by taking a vision of consequence. Connection Science, 34(1), 608–629. https://doi.org/10.1080/09540091.2022.2025765
  • Zhu, Q., Zhang, Z., Fang, Y., Li, X., Takanobu, R., Li, J., & Huang, M. (2020). Convlab-2: An open-source toolkit for building, evaluating, and diagnosing dialogue systems. arXiv preprint arXiv:2002.04793.
  • Zhu, H., Zhao, Y., & Qin, H. (2021). Cold-started Curriculum Learning for Task-oriented Dialogue Policy. In IEEE International Conference on E-Business Engineering (icebe) (pp. 100–105).