98
Views
1
CrossRef citations to date
0
Altmetric
Computers and computing

Keyphrase Extraction Using Enhanced Word and Document Embedding

, , &
 

ABSTRACT

We live in a world where information is available, in all areas of human activity, increasingly in digital text documents. It is necessary to explore the knowledge implied in these documents considering its fast-growing availability. The use of keywords provides for a more effective search for a document of interest, as keywords highlight a document's primary concept and, therefore, allow the researcher's interest to be readily aligned with that text. In this article, an unsupervised keyword extraction approach is proposed. The proposed approach retrofitted the concept of n-grams with state-of-the-art words and document embeddings. The approach simultaneously proposed a new method to compose document vectors using important word vectors and their idf-scores. Here we use higher-order word n-grams to improve various unigram embeddings and introduce a novel task to produce document embedding for document representation. The performance of the proposed embeddings is evaluated using four different datasets. The combination of higher-order word n-grams retrofitted Glove, and document embedding is the best embedding to be used for extracting key phrases. The bi-gram retrofitted embedding improves the results significantly over the baseline approaches.

ACKNOWLEDGEMENT

The authors, therefore, acknowledge with thanks DSR's technical and financial support.

DISCLOSURE STATEMENT

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This project was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University Jeddah [grant number G-16-611-1441].

Notes on contributors

Fahd Saleh Alotaibi

Fahd S Alotaibi earned his PhD in computer science from the University of Birmingham, the United Kingdom, in 2015. He is currently working as associate professor in the Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia. His research interests include natural language processing, data science, and data mining. Email: [email protected]

Saurabh Sharma

Saurabh Sharma is currently working as assistant professor at the Thapar Institute of Engineering & Technology, Patiala. He has been in the teaching profession since 2007. He is a PhD from UIET, Panjab University, Chandigarh, with a research interest in the domain of NLP, computer vision, machine learning, and data analysis. He has published his research findings in many reputed SCI-indexed journals. He is a recipient of the Senior Research Fellowship (2016-2021) under Visvesvaraya PhD Scheme for Electronics & IT, Ministry of Communications & IT, Government of India.

Vishal Gupta

Vishal Gupta is an associate professor at the University Institute of Engineering and Technology, Panjab University, Chandigarh, India. Vishal Gupta received a BTech degree in computer science & engineering from SBS CET Ferozepur, Punjab, India. He received his MTech in computer science and engineering from the Department of Computer Science, Punjabi University Patiala, Punjab, India. In 2013, he was awarded PhD in the Faculty of Engineering and Technology for his research in the field of automatic text summarization for Punjabi language. His research interests include artificial intelligence, machine/deep learning and natural language processing, and automatic text summarization. Email: [email protected]

Savita Gupta

Savita Gupta is a professor at the University Institute of Engineering and Technology, Panjab University, Chandigarh, India. She received her ME in computer science and engineering from the Department of Computer Science, Thapar University Patiala, Punjab, India. In 2007, she was awarded PhD degree in computer science and engineering for her research in the field of medical imaging. She has published her research findings in many reputed SCI-indexed journals. Her research interests include image processing, medical image analysis, speckle reduction, machine learning, wavelet applications, biometric security, and cognitive radios. Email: [email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.