41
Views
0
CrossRef citations to date
0
Altmetric
Computers and computing

ADS-Classif: A Robust Tool for Classifying Documents from Diverse Sources Using Weighted Domain-Specific Vocabulary

ORCID Icon, ORCID Icon & ORCID Icon
 

Abstract

Extracting domain-specific words from the domain corpus to build vocabulary and identifying classes represented by domain-specific vocabulary is a challenging task in classifying text. Several comprehensive domain-specific techniques have been developed to build specific domain vocabulary but they lack generality and also do not consider the degree of the word's relevance. This paper proposes a supervised generic algorithmic model to create a domain-specific vocabulary. The model identifies domain-indicative word combinations from labeled real-time domain corpus, calculates their weights and logged their inclusion time to understand their semantic behavior in the present era of development. Further, a framework ADS-Classif is also developed using vocabulary to classify real-time documents extracted from different sources and test the domain-specific vocabulary's significance. Experimental results show that the approach leads to the accurate identification of domain classes for test documents compared with classical classifiers Naïve Bayes and Support Vector Machine.

DISCLOSURE STATEMENT

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Vandana Kalra

Vandana Kalra is an associate professor at Sri Guru Gobind Singh College of Commerce, University of Delhi, Delhi, India. She holds the post of head of department of computer science from 2006 to 2019. She has been teaching in this field for 23 years and has been active in the multidisciplinary field of research and innovation. She has a deep passion for innovation related to technology and advanced areas of computer science. She has been involved in many innovation projects and achieved the best societal impact award for one of the projects. Email: [email protected]

Indu Kashyap

Indu Kashyap has 15 years of teaching, administration, and research experience. Currently, she is a professor with Manav Rachna International Institute of Research and Studies. She has to her credit more than 60 publications in reputed journals and conferences including Elsevier, Springer, and Taylor & Francis. Her research areas include wireless networks, machine learning, data analytics, and recommender systems. Email: [email protected]

Harmeet Kaur

Harmeet Kaur is an associate professor in the Department of Computer Science, Hansraj College, University of Delhi. Her teaching experience spans over a period of around 24 years. Her research interests lie in the field of recommender systems and crowdsourcing. She has published around 45 research papers in national and international journals and conferences of repute. Corresponding author. Email: [email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.