Count vectorizer model based web application vulnerability detection using artificial intelligence approach: Journal of Discrete Mathematical Sciences and Cryptography: Vol 25, No 7

Views

CrossRef citations to date

Altmetric

Abstract

A web application is a dynamic, intricate, and interactive program that provides end-users with information and services such as utility payments, online communication, e-learning, socializing, shopping, online banking, and income tax filing etc. Web applications have become a major target for attackers due to their accessibility, availability, and ubiquity. Web application vulnerabilities are hazardous for some reasons. Attackers can harm an organizations image and status. The implementation flaws in web application allow the invader to infuse user-input that violates the syntax-based assembly of the query or infuse malicious code etc. Among various types of injection flaws, SQL injection (SQLI) is more prominent than (XML) both are considered as common application-layer web attack, which allows the attacker to bypass the security mechanisms therefore; these two are ranked as the most common vulnerabilities. Hence, a methodology for detecting evaluating both SQLI & XML vulnerabilities in web applications are considered for research. This research work addresses the above mentioned flaws and proposed an Ensemble Method to classify the Structure Query Language injection vulnerabilities, we selected a benchmark dataset with 33,758 rows containing; various types of SQL and XML injection attacks. Raw data is preprocessed to remove artifacts, and then feature engineering is performed using Natural Language Processing techniques to clean the data and extract 6 types of features such as TF-IDF, Word-to-Vector, SkipGram, Count Vectorizer, Glove and Continuous Bag of words. Imbalance data is handled using sampling techniques, best features are selected using 4 types of validation techniques Significant Test, PCA, Variance Threshold and Sbest. Prepared data is provided to Ensemble Model having two stages; Stage-2 accepts URL from the user and detects presence of susceptibility in the sub domains and domains. Stage-1 having 9 different types of machine learning models Multinomial, Gaussian, Bernoulli Naive Bayes, Logistic Regression, Decision Tree, Random Forest, AdaBoost, SVC with, poly, rbf and linear kernel, these models are trained on additional vectors such as google news and glove to detect the new query either SQL or XML for presences or absence of vulnerability, using this proposed ensemble approach obtained the accuracy of 99%.

Subject Classification:

68T01

Keywords:

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Count vectorizer model based web application vulnerability detection using artificial intelligence approach

Information for

Open access

Opportunities

Help and information

Count vectorizer model based web application vulnerability detection using artificial intelligence approach

Abstract

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature