98
Views
1
CrossRef citations to date
0
Altmetric
Computers and computing

Keyphrase Extraction Using Enhanced Word and Document Embedding

, , &
Pages 8876-8888 | Published online: 07 Aug 2022
 

ABSTRACT

We live in a world where information is available, in all areas of human activity, increasingly in digital text documents. It is necessary to explore the knowledge implied in these documents considering its fast-growing availability. The use of keywords provides for a more effective search for a document of interest, as keywords highlight a document's primary concept and, therefore, allow the researcher's interest to be readily aligned with that text. In this article, an unsupervised keyword extraction approach is proposed. The proposed approach retrofitted the concept of n-grams with state-of-the-art words and document embeddings. The approach simultaneously proposed a new method to compose document vectors using important word vectors and their idf-scores. Here we use higher-order word n-grams to improve various unigram embeddings and introduce a novel task to produce document embedding for document representation. The performance of the proposed embeddings is evaluated using four different datasets. The combination of higher-order word n-grams retrofitted Glove, and document embedding is the best embedding to be used for extracting key phrases. The bi-gram retrofitted embedding improves the results significantly over the baseline approaches.

ACKNOWLEDGEMENT

The authors, therefore, acknowledge with thanks DSR's technical and financial support.

DISCLOSURE STATEMENT

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This project was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University Jeddah [grant number G-16-611-1441].

Notes on contributors

Fahd Saleh Alotaibi

Fahd S Alotaibi earned his PhD in computer science from the University of Birmingham, the United Kingdom, in 2015. He is currently working as associate professor in the Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia. His research interests include natural language processing, data science, and data mining. Email: [email protected]

Saurabh Sharma

Saurabh Sharma is currently working as assistant professor at the Thapar Institute of Engineering & Technology, Patiala. He has been in the teaching profession since 2007. He is a PhD from UIET, Panjab University, Chandigarh, with a research interest in the domain of NLP, computer vision, machine learning, and data analysis. He has published his research findings in many reputed SCI-indexed journals. He is a recipient of the Senior Research Fellowship (2016-2021) under Visvesvaraya PhD Scheme for Electronics & IT, Ministry of Communications & IT, Government of India.

Vishal Gupta

Vishal Gupta is an associate professor at the University Institute of Engineering and Technology, Panjab University, Chandigarh, India. Vishal Gupta received a BTech degree in computer science & engineering from SBS CET Ferozepur, Punjab, India. He received his MTech in computer science and engineering from the Department of Computer Science, Punjabi University Patiala, Punjab, India. In 2013, he was awarded PhD in the Faculty of Engineering and Technology for his research in the field of automatic text summarization for Punjabi language. His research interests include artificial intelligence, machine/deep learning and natural language processing, and automatic text summarization. Email: [email protected]

Savita Gupta

Savita Gupta is a professor at the University Institute of Engineering and Technology, Panjab University, Chandigarh, India. She received her ME in computer science and engineering from the Department of Computer Science, Thapar University Patiala, Punjab, India. In 2007, she was awarded PhD degree in computer science and engineering for her research in the field of medical imaging. She has published her research findings in many reputed SCI-indexed journals. Her research interests include image processing, medical image analysis, speckle reduction, machine learning, wavelet applications, biometric security, and cognitive radios. Email: [email protected]

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 100.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.