Cyber Threat and Vulnerability Classification Using NLP and Machine Learning Techniques on Text-Based Security Data

Talha Khan; Mubasher Malik; Zahid Aziz; Muhammad Kamran Abid; Muhammad Sabir

Authors

Talha Khan university of southern Punjab, multan, pakistan
Mubasher Malik University of southern Punjab
Zahid Aziz Department of Computer Science, Emerson University, Multan, Pakistan
Muhammad Kamran Abid Department of Computer Science, Emerson University, Multan, Pakistan
Muhammad Sabir University of southern Punjab

Keywords:

Cyber Security , Machine Learning, NLP SVM Random Forest

Abstract

The rapidly developing cybersecurity sector faces the essential problem of detecting and classifying cyber threats with precision. The rise of complicated data and its growing volume requires machine learning (ML) techniques to successfully automate threat detection operations through modern methods. The research evaluates six different ML algorithms for cybersecurity threat classification through Logistic Regression, SVM, Random Forest, Naive Bayes, LSTM, and BERT performance analysis. The systematic evaluation methodology analyzes these models by measuring their accuracy, together with precision and recall metrics, along with F1-score and execution time efficiency. Our examination starts with tokenization, then carries out stop-word elimination before performing TF-IDF vectorization for model enhancement purposes through various feature encoding approaches. The study examines the effects that employing both categorical and continuous feature encoding methods has on the outcomes. The research makes its original contribution through analyzing performance-speed tradeoffs between deep learning models and standard models applied to cybersecurity contexts. BERT proves to be the superior model since it delivers 93.8% accuracy and 96.2% ROC-AUC score at the cost of increased computational requirements. Random Forest and SVM exhibited comparable results, but Naive Bayes demonstrated the least effective performance with accuracy and recall statistics. BERT outperforms other models in cybersecurity, but its high computing requirements prevent it from real-time implementation.

Author Biographies

Mubasher Malik, University of southern Punjab

Professor Dr Mubasher Malik is Chairman of department of computer Science in University of southern Punjab Multan

Zahid Aziz, Department of Computer Science, Emerson University, Multan, Pakistan

Zahid Aziz is lecturer in department of computer science, emerson university, multan

Muhammad Kamran Abid, Department of Computer Science, Emerson University, Multan, Pakistan

Muhammad Kamran is lecturer in department of computer science, emerson university, multan

Muhammad Sabir, University of southern Punjab

Assistant Professor in department of computer science

Cyber Threat and Vulnerability Classification Using NLP and Machine Learning Techniques on Text-Based Security Data

Authors

Keywords:

Abstract

Author Biographies

Mubasher Malik, University of southern Punjab

Zahid Aziz, Department of Computer Science, Emerson University, Multan, Pakistan

Muhammad Kamran Abid, Department of Computer Science, Emerson University, Multan, Pakistan

Muhammad Sabir, University of southern Punjab

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Collaboraters

Make a Submission

Information