686
Views
0
CrossRef citations to date
0
Altmetric
Research Article

MoCoUTRL: a momentum contrastive framework for unsupervised text representation learning

ORCID Icon, , , &
Article: 2221406 | Received 20 Feb 2023, Accepted 31 May 2023, Published online: 16 Jun 2023

References

  • Bachman, P., Hjelm, R. D., & Buchwalter, W. (2019). Learning representations by maximizing mutual information across views. Advances in Neural Information Processing Systems.
  • Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and New perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. https://doi.org/10.1109/TPAMI.2013.50
  • Bengio, Y., Ducharme, R., & Vincent, P. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155. https://dl.acm.org/doi/abs/10.5555/944919.944966
  • Carlsson, F., Gogoulou, E., Ylipaa, E., Gyllensten, A. C., & Sahlgren, M. (2021). SEMANTIC RE-TUNING WITH CONTRASTIVE TENSION. 21.
  • Caron, M., Goyal, P., Misra, I., Mairal, J., Bojanowski, P., & Joulin, A. (2020). Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems (pp. 9912–9924).
  • Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., & Specia, L. (2017). SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) (pp. 1–14). https://doi.org/10.18653/v1/S17-2001
  • Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A Simple Framework for Contrastive Learning of Visual Representations. 1597–1607.
  • Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., … Fiedel, N. (2022). PaLM: Scaling Language Modeling with Pathways (arXiv:2204.02311). arXiv. http://arxiv.org/abs/2204.02311
  • Chuang, Y.-S., Dangovski, R., Luo, H., Zhang, Y., Chang, S., Soljacic, M., Li, S.-W., Yih, S., Kim, Y., & Glass, J. (2022). Diffcse: Difference-based contrastive learning for sentence embeddings. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4207–4218). https://doi.org/10.18653/v1/2022.naacl-main.311
  • Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., & Salakhutdinov, R. (2019). Transformer-XL: Attentive language models beyond a fixed-length context. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 2978–2988). https://doi.org/10.18653/v1/P19-1285
  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A Large-Scale Hierarchical Image Database. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
  • Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (arXiv:1810.04805). arXiv. http://arxiv.org/abs/1810.04805
  • Dosovitskiy, A., Springenberg, J. T., Riedmiller, M., & Brox, T. (2014). Discriminative unsupervised feature learning with convolutional neural networks. Advances in Neural Information Processing Systems.
  • Gao, T., Yao, X., & Chen, D. (2021). Simcse: Simple contrastive learning of sentence embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 6894–6910). https://doi.org/10.18653/v1/2021.emnlp-main.552
  • Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition (pp. 580–587). https://doi.org/10.1109/CVPR.2014.81
  • Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Buchatskaya, E., Doersch, C., Pires, B. A., Guo, Z. D., Azar, M. G., Piot, B., Kavukcuoglu, K., Munos, R., & Valko, M. (2020). Bootstrap your Own latent A New approach to self-supervised learning. Advances in Neural Information Processing Systems (pp. 21271–21284).
  • Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR’06) (pp. 1735–1742). https://doi.org/10.1109/CVPR.2006.100
  • Harris, Z. S. (1954). Distributional structure. WORD, 10(2–3), 146–162. https://doi.org/10.1080/00437956.1954.11659520
  • He, K., Chen, X., Xie, S., Li, Y., Dollar, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 15979–15988). https://doi.org/10.1109/CVPR52688.2022.01553
  • He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 9726–9735). https://doi.org/10.1109/CVPR42600.2020.00975
  • He, P., Liu, X., Gao, J., & Chen, W. (2021). Deberta: Decoding-enhanced bert with disentangled attention. International Conference on Learning Representations.
  • Hill, F., Cho, K., & Korhonen, A. (2016). Learning distributed representations of sentences from unlabelled data. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1367–1377). https://doi.org/10.18653/v1/N16-1162
  • Hinton, G. E. (1986). Learning distributed representations of concepts. Proceedings of the Eighth Annual Conference of the Cognitive Science Society (pp. 12–23).
  • Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507. https://doi.org/10.1126/science.1127647
  • Hjelm, R. D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., & Bengio, Y. (2019). LEARNING DEEP REPRESENTATIONS BY MUTUAL IN- FORMATION ESTIMATION AND MAXIMIZATION. 24.
  • Keskar, N. S., McCann, B., Varshney, L. R., Xiong, C., & Socher, R. (2019). CTRL: A Conditional Transformer Language Model for Controllable Generation (arXiv:1909.05858). arXiv. http://arxiv.org/abs/1909.05858
  • Kitaev, N., Kaiser, L., & Levskaya, A. (2020). Reformer: The efficient transformer. International Conference on Learning Representations 2020.
  • Kong, L., de Masson d’Autume, C., Yu, L., Ling, W., Dai, Z., & Yogatama, D. (2019). A Mutual Information Maximization Perspective of Language Representation Learning. International Conference on Learning Representations.
  • Kowsari, K., Brown, D. E., Heidarysafa, M., Jafari Meimandi, K., Gerber, M. S., & Barnes, L. E. (2017). Hdltex: Hierarchical deep learning for text classification. 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 364–371). https://doi.org/10.1109/ICMLA.2017.0-134
  • Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS. 17.
  • Lee, S., Lee, D., Jang, S., & Yu, H. (2022). Toward interpretable semantic textual similarity via optimal transport-based contrastive sentence learning. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 5969–5979). https://doi.org/10.18653/v1/2022.acl-long.412
  • Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2020). Bart: Denoising sequence-to-sequence Pre-training for natural language generation, translation, and comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871–7880). https://doi.org/10.18653/v1/2020.acl-main.703
  • Li, B., Zhou, H., He, J., Wang, M., Yang, Y., & Li, L. (2020). On the sentence embeddings from Pre-trained language models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 9119–9130). https://doi.org/10.18653/v1/2020.emnlp-main.733
  • Linsker, R. (1988). Self-organization in a perceptual network. Computer, 21(3), 105–117. https://doi.org/10.1109/2.36
  • Liu, Z., Lin, W., Shi, Y., & Zhao, J. (2021). A robustly optimized BERT Pre-training approach with post-training. In S. Li, M. Sun, Y. Liu, H. Wu, L. Kang, W. Che, S. He, & G. Rao (Eds.), Chinese computational linguistics (Vol. 12869, pp. 471–484). Springer International Publishing. https://doi.org/10.1007/978-3-030-84186-7_31
  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. https://arxiv.org/pdf/1301.3781.pdf
  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems.
  • Oord, A. V. D., Li, Y., & Vinyals, O. (2019). Representation Learning with Contrastive Predictive Coding (arXiv:1807.03748). arXiv. http://arxiv.org/abs/1807.03748
  • OpenAI. (2023). GPT-4 Technical Report. (arXiv:2303.08774). arXiv. http://arxiv.org/abs/2303.08774
  • Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring Mid-level image representations using convolutional neural networks. 2014 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1717–1724). https://doi.org/10.1109/CVPR.2014.222
  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J. & Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library (arXiv:1912.01703). arXiv. https://doi.org/10.48550/arXiv.1912.01703
  • Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). https://doi.org/10.3115/v1/D14-1162
  • Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations (arXiv:1802.05365). arXiv. http://arxiv.org/abs/1802.05365
  • Qian, J., Dong, L., Shen, Y., Wei, F., & Chen, W. (2022). Controllable natural language generation with contrastive prefixes. Findings of the Association for Computational Linguistics, 2022, 2912–2924. https://doi.org/10.18653/v1/2022.findings-acl.229
  • Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.
  • Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(1), 1–67. https://dl.acm.org/doi/abs/10.5555/3455716.3455856
  • Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using siamese BERT-networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 3980–3990). https://doi.org/10.18653/v1/D19-1410
  • Su, J., Cao, J., Liu, W., & Ou, Y. (2021). Whitening Sentence Representations for Better Semantics and Faster Retrieval (arXiv:2103.15316). arXiv. https://doi.org/10.48550/arXiv.2103.15316
  • Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models (arXiv:2302.13971). arXiv. https://doi.org/10.48550/arXiv.2302.13971
  • Wang, T., & Isola, P. (2020). Understanding contrastive representation learning through alignment and uniformity on the hypersphere. Proceedings of the 37th International Conference on Machine Learning (pp. 9929–9939).
  • Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., & Rush, A. (2020). Transformers: State-of-the-Art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 38–45). https://doi.org/10.18653/v1/2020.emnlp-demos.6
  • Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Aligning books and movies: Towards story-like visual explanations by watching movies and Reading books. 2015 IEEE International Conference on Computer Vision (ICCV) (pp. 19–27).