Search in:

Connection Science Volume 35, 2023 - Issue 1

Submit an article Journal homepage

Open access

686

Views

CrossRef citations to date

Altmetric

Research Article

MoCoUTRL: a momentum contrastive framework for unsupervised text representation learning

Ao Zoua Command and Control Engineering College, Army Engineering University of PLA, Nanjing, People’s Republic of China

https://orcid.org/0000-0002-9204-2376 View further author information

Wenning Haoa Command and Control Engineering College, Army Engineering University of PLA, Nanjing, People’s Republic of ChinaCorrespondence[email protected]
View further author information

Dawei Jina Command and Control Engineering College, Army Engineering University of PLA, Nanjing, People’s Republic of ChinaView further author information

Gang Chena Command and Control Engineering College, Army Engineering University of PLA, Nanjing, People’s Republic of ChinaView further author information

Feiyan Suna Command and Control Engineering College, Army Engineering University of PLA, Nanjing, People’s Republic of China;b Jinling Institute of Technology, Nanjing, People’s Republic of ChinaView further author information

Article: 2221406 | Received 20 Feb 2023, Accepted 31 May 2023, Published online: 16 Jun 2023

Cite this article
https://doi.org/10.1080/09540091.2023.2221406
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Bachman, P., Hjelm, R. D., & Buchwalter, W. (2019). Learning representations by maximizing mutual information across views. Advances in Neural Information Processing Systems.
Google Scholar
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and New perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. https://doi.org/10.1109/TPAMI.2013.50
PubMed Web of Science ®Google Scholar
Bengio, Y., Ducharme, R., & Vincent, P. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155. https://dl.acm.org/doi/abs/10.5555/944919.944966
Web of Science ®Google Scholar
Carlsson, F., Gogoulou, E., Ylipaa, E., Gyllensten, A. C., & Sahlgren, M. (2021). SEMANTIC RE-TUNING WITH CONTRASTIVE TENSION. 21.
Google Scholar
Caron, M., Goyal, P., Misra, I., Mairal, J., Bojanowski, P., & Joulin, A. (2020). Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems (pp. 9912–9924).
Google Scholar
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., & Specia, L. (2017). SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) (pp. 1–14). https://doi.org/10.18653/v1/S17-2001
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A Simple Framework for Contrastive Learning of Visual Representations. 1597–1607.
Google Scholar
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., … Fiedel, N. (2022). PaLM: Scaling Language Modeling with Pathways (arXiv:2204.02311). arXiv. http://arxiv.org/abs/2204.02311
Google Scholar
Chuang, Y.-S., Dangovski, R., Luo, H., Zhang, Y., Chang, S., Soljacic, M., Li, S.-W., Yih, S., Kim, Y., & Glass, J. (2022). Diffcse: Difference-based contrastive learning for sentence embeddings. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4207–4218). https://doi.org/10.18653/v1/2022.naacl-main.311
Google Scholar
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., & Salakhutdinov, R. (2019). Transformer-XL: Attentive language models beyond a fixed-length context. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 2978–2988). https://doi.org/10.18653/v1/P19-1285
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A Large-Scale Hierarchical Image Database. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (arXiv:1810.04805). arXiv. http://arxiv.org/abs/1810.04805
Google Scholar
Dosovitskiy, A., Springenberg, J. T., Riedmiller, M., & Brox, T. (2014). Discriminative unsupervised feature learning with convolutional neural networks. Advances in Neural Information Processing Systems.
Google Scholar
Gao, T., Yao, X., & Chen, D. (2021). Simcse: Simple contrastive learning of sentence embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 6894–6910). https://doi.org/10.18653/v1/2021.emnlp-main.552
Google Scholar
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition (pp. 580–587). https://doi.org/10.1109/CVPR.2014.81
Google Scholar
Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Buchatskaya, E., Doersch, C., Pires, B. A., Guo, Z. D., Azar, M. G., Piot, B., Kavukcuoglu, K., Munos, R., & Valko, M. (2020). Bootstrap your Own latent A New approach to self-supervised learning. Advances in Neural Information Processing Systems (pp. 21271–21284).
Google Scholar
Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR’06) (pp. 1735–1742). https://doi.org/10.1109/CVPR.2006.100
Google Scholar
Harris, Z. S. (1954). Distributional structure. WORD, 10(2–3), 146–162. https://doi.org/10.1080/00437956.1954.11659520
Web of Science ®Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollar, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 15979–15988). https://doi.org/10.1109/CVPR52688.2022.01553
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 9726–9735). https://doi.org/10.1109/CVPR42600.2020.00975
Google Scholar
He, P., Liu, X., Gao, J., & Chen, W. (2021). Deberta: Decoding-enhanced bert with disentangled attention. International Conference on Learning Representations.
Google Scholar
Hill, F., Cho, K., & Korhonen, A. (2016). Learning distributed representations of sentences from unlabelled data. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1367–1377). https://doi.org/10.18653/v1/N16-1162
Google Scholar
Hinton, G. E. (1986). Learning distributed representations of concepts. Proceedings of the Eighth Annual Conference of the Cognitive Science Society (pp. 12–23).
Google Scholar
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507. https://doi.org/10.1126/science.1127647
PubMed Web of Science ®Google Scholar
Hjelm, R. D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., & Bengio, Y. (2019). LEARNING DEEP REPRESENTATIONS BY MUTUAL IN- FORMATION ESTIMATION AND MAXIMIZATION. 24.
Google Scholar
Keskar, N. S., McCann, B., Varshney, L. R., Xiong, C., & Socher, R. (2019). CTRL: A Conditional Transformer Language Model for Controllable Generation (arXiv:1909.05858). arXiv. http://arxiv.org/abs/1909.05858
Google Scholar
Kitaev, N., Kaiser, L., & Levskaya, A. (2020). Reformer: The efficient transformer. International Conference on Learning Representations 2020.
Google Scholar
Kong, L., de Masson d’Autume, C., Yu, L., Ling, W., Dai, Z., & Yogatama, D. (2019). A Mutual Information Maximization Perspective of Language Representation Learning. International Conference on Learning Representations.
Google Scholar
Kowsari, K., Brown, D. E., Heidarysafa, M., Jafari Meimandi, K., Gerber, M. S., & Barnes, L. E. (2017). Hdltex: Hierarchical deep learning for text classification. 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 364–371). https://doi.org/10.1109/ICMLA.2017.0-134
Google Scholar
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS. 17.
Google Scholar
Lee, S., Lee, D., Jang, S., & Yu, H. (2022). Toward interpretable semantic textual similarity via optimal transport-based contrastive sentence learning. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 5969–5979). https://doi.org/10.18653/v1/2022.acl-long.412
Google Scholar
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2020). Bart: Denoising sequence-to-sequence Pre-training for natural language generation, translation, and comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871–7880). https://doi.org/10.18653/v1/2020.acl-main.703
Google Scholar
Li, B., Zhou, H., He, J., Wang, M., Yang, Y., & Li, L. (2020). On the sentence embeddings from Pre-trained language models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 9119–9130). https://doi.org/10.18653/v1/2020.emnlp-main.733
Google Scholar
Linsker, R. (1988). Self-organization in a perceptual network. Computer, 21(3), 105–117. https://doi.org/10.1109/2.36
Web of Science ®Google Scholar
Liu, Z., Lin, W., Shi, Y., & Zhao, J. (2021). A robustly optimized BERT Pre-training approach with post-training. In S. Li, M. Sun, Y. Liu, H. Wu, L. Kang, W. Che, S. He, & G. Rao (Eds.), Chinese computational linguistics (Vol. 12869, pp. 471–484). Springer International Publishing. https://doi.org/10.1007/978-3-030-84186-7_31
Google Scholar
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. https://arxiv.org/pdf/1301.3781.pdf
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems.
Google Scholar
Oord, A. V. D., Li, Y., & Vinyals, O. (2019). Representation Learning with Contrastive Predictive Coding (arXiv:1807.03748). arXiv. http://arxiv.org/abs/1807.03748
Google Scholar
OpenAI. (2023). GPT-4 Technical Report. (arXiv:2303.08774). arXiv. http://arxiv.org/abs/2303.08774
Google Scholar
Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring Mid-level image representations using convolutional neural networks. 2014 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1717–1724). https://doi.org/10.1109/CVPR.2014.222
Google Scholar
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J. & Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library (arXiv:1912.01703). arXiv. https://doi.org/10.48550/arXiv.1912.01703
Google Scholar
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). https://doi.org/10.3115/v1/D14-1162
Google Scholar
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations (arXiv:1802.05365). arXiv. http://arxiv.org/abs/1802.05365
Google Scholar
Qian, J., Dong, L., Shen, Y., Wei, F., & Chen, W. (2022). Controllable natural language generation with contrastive prefixes. Findings of the Association for Computational Linguistics, 2022, 2912–2924. https://doi.org/10.18653/v1/2022.findings-acl.229
Google Scholar
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.
Google Scholar
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the limits of transfer learning with a uniﬁed text-to-text transformer. Journal of Machine Learning Research, 21(1), 1–67. https://dl.acm.org/doi/abs/10.5555/3455716.3455856
PubMedGoogle Scholar
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using siamese BERT-networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 3980–3990). https://doi.org/10.18653/v1/D19-1410
Google Scholar
Su, J., Cao, J., Liu, W., & Ou, Y. (2021). Whitening Sentence Representations for Better Semantics and Faster Retrieval (arXiv:2103.15316). arXiv. https://doi.org/10.48550/arXiv.2103.15316
Google Scholar
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models (arXiv:2302.13971). arXiv. https://doi.org/10.48550/arXiv.2302.13971
Google Scholar
Wang, T., & Isola, P. (2020). Understanding contrastive representation learning through alignment and uniformity on the hypersphere. Proceedings of the 37th International Conference on Machine Learning (pp. 9929–9939).
Google Scholar
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., & Rush, A. (2020). Transformers: State-of-the-Art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 38–45). https://doi.org/10.18653/v1/2020.emnlp-demos.6
Google Scholar
Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Aligning books and movies: Towards story-like visual explanations by watching movies and Reading books. 2015 IEEE International Conference on Computer Vision (ICCV) (pp. 19–27).
Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

MoCoUTRL: a momentum contrastive framework for unsupervised text representation learning

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

MoCoUTRL: a momentum contrastive framework for unsupervised text representation learning

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date