99
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

A novel phishing website classification method based on hybrid sampling

ORCID Icon & ORCID Icon
Pages 1-30 | Received 10 Jan 2023, Accepted 03 Jun 2023, Published online: 24 Aug 2023
 

ABSTRACT

In real-world anomaly detection tasks such as Credit Card Fraud Detection, Cancer Patients Detection, Phishing Website Detection, etc., the training datasets often suffer from skewed class distribution. But the traditional Machine Learning (ML) classification algorithms assume balanced class distribution and equal misclassification costs. As a result, when class-imbalanced data are presented to the traditional ML algorithms they tend to produce biased and inaccurate predictive ML models. In this study, we propose four novel Phishing Website Classification models namely, SMOTEENN-XGB, SMOTEENN-RF, SMOTEENN-LR, and SMOTEENN-SVM by combining SMOTEENN (SMOTE + ENN) hybrid sampling technique with eXtreme Gradient Boosting (XGB), Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM) classifiers respectively. We propose the use of SMOTEENN hybrid sampling as the novel approach to address the problem of class imbalance in Phishing Website datasets prior to building classification models. To the best of our knowledge and belief, our novel proposed four models SMOTEENN-XGB, SMOTEENN-RF, SMOTEEEN-LR, and SMOTEENN-SVM for Phishing Website Detection based on SMOTEENN hybrid sampling approach have not been published in the existing studies as of now.

View correction statement:
Correction

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Jaya Srivastava

Jaya Srivastava was born in India. She earned her bachelor’s degree, B.Sc. Computer Science from Delhi University, New Delhi, India, in 1998. She received her two master’s degrees, i.e., Master of Computer Applications (MCA) from Banasthali Vidyapith, Rajasthan, India, in 1991, and M. Tech. Computer Applications from INDIAN INSTITUTE OF TECHNOLOGY DELHI, New Delhi, India, in 2002. The author is presently pursuing Ph.D. from the School of Computer & Systems Sciences (SC & SS), Jawaharlal Nehru University, New Delhi, INDIA under the mentorship of Dr. Aditi Sharan, Associate Professor, Jawaharlal Nehru University, New Delhi, India. She is currently working as a System Architect in INDIAN INSTITUTE OF TECHNOLOGY DELHI (IIT Delhi), New Delhi, India. Prior to joining IIT Delhi, India, she worked for more than 10 years in NATIONAL INFORMATICS CENTRE (NIC), New Delhi, India, when she left NIC as Principal Systems Analyst to join IIT Delhi.

Aditi Sharan

Aditi Sharan was born in India. She earned her B.Sc. degree from Sukhadia University, Udaipur, India, in 1988. She received her master’s degree in M.Sc. Computer Science from BANASTHALI VIDYAPITH, Rajasthan, India, in 1990. She pursued her Ph. D degree from Jai Narain University, Jodhpur, India, in 1996. She joined as Assistant Professor in 2004 at Jawaharlal Nehru University (JNU) in New Delhi, India. She is currently working as an Associate Professor, the School of Computer & Systems Sciences (SC & SS), Jawaharlal Nehru University (JNU), New Delhi, India. She is actively involved in research for the last 20 years. Her research interest includes Machine Learning, Natural Language Processing, Information Retrieval and Extraction, Sentiment Analysis, Ontologies and their applications, and other related fields. She has supervised around 20 Ph.D. and more than 30 M. Tech. students. She has several publications in reputed journals and presented papers at various National and International Conferences in India and abroad. She has delivered ‘Invited Talks’ at many Institutes of repute in India and abroad.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 207.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.