41
Views
0
CrossRef citations to date
0
Altmetric
Computers and computing

ADS-Classif: A Robust Tool for Classifying Documents from Diverse Sources Using Weighted Domain-Specific Vocabulary

ORCID Icon, ORCID Icon & ORCID Icon
Pages 8640-8646 | Published online: 19 Feb 2023
 

Abstract

Extracting domain-specific words from the domain corpus to build vocabulary and identifying classes represented by domain-specific vocabulary is a challenging task in classifying text. Several comprehensive domain-specific techniques have been developed to build specific domain vocabulary but they lack generality and also do not consider the degree of the word's relevance. This paper proposes a supervised generic algorithmic model to create a domain-specific vocabulary. The model identifies domain-indicative word combinations from labeled real-time domain corpus, calculates their weights and logged their inclusion time to understand their semantic behavior in the present era of development. Further, a framework ADS-Classif is also developed using vocabulary to classify real-time documents extracted from different sources and test the domain-specific vocabulary's significance. Experimental results show that the approach leads to the accurate identification of domain classes for test documents compared with classical classifiers Naïve Bayes and Support Vector Machine.

DISCLOSURE STATEMENT

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Vandana Kalra

Vandana Kalra is an associate professor at Sri Guru Gobind Singh College of Commerce, University of Delhi, Delhi, India. She holds the post of head of department of computer science from 2006 to 2019. She has been teaching in this field for 23 years and has been active in the multidisciplinary field of research and innovation. She has a deep passion for innovation related to technology and advanced areas of computer science. She has been involved in many innovation projects and achieved the best societal impact award for one of the projects. Email: [email protected]

Indu Kashyap

Indu Kashyap has 15 years of teaching, administration, and research experience. Currently, she is a professor with Manav Rachna International Institute of Research and Studies. She has to her credit more than 60 publications in reputed journals and conferences including Elsevier, Springer, and Taylor & Francis. Her research areas include wireless networks, machine learning, data analytics, and recommender systems. Email: [email protected]

Harmeet Kaur

Harmeet Kaur is an associate professor in the Department of Computer Science, Hansraj College, University of Delhi. Her teaching experience spans over a period of around 24 years. Her research interests lie in the field of recommender systems and crowdsourcing. She has published around 45 research papers in national and international journals and conferences of repute. Corresponding author. Email: [email protected]

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 100.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.