Cyber Threat and Vulnerability Classification Using NLP and Machine Learning Techniques on Text-Based Security Data

Authors

  • Talha Khan university of southern Punjab, multan, pakistan
  • Mubasher Malik University of southern Punjab
  • Zahid Aziz Department of Computer Science, Emerson University, Multan, Pakistan
  • Muhammad Kamran Abid Department of Computer Science, Emerson University, Multan, Pakistan
  • Muhammad Sabir University of southern Punjab

Keywords:

Cyber Security , Machine Learning, NLP SVM Random Forest

Abstract

The rapidly developing cybersecurity sector faces the essential problem of detecting and classifying cyber threats with precision. The rise of complicated data and its growing volume requires machine learning (ML) techniques to successfully automate threat detection operations through modern methods. The research evaluates six different ML algorithms for cybersecurity threat classification through Logistic Regression, SVM, Random Forest, Naive Bayes, LSTM, and BERT performance analysis. The systematic evaluation methodology analyzes these models by measuring their accuracy, together with precision and recall metrics, along with F1-score and execution time efficiency. Our examination starts with tokenization, then carries out stop-word elimination before performing TF-IDF vectorization for model enhancement purposes through various feature encoding approaches. The study examines the effects that employing both categorical and continuous feature encoding methods has on the outcomes. The research makes its original contribution through analyzing performance-speed tradeoffs between deep learning models and standard models applied to cybersecurity contexts. BERT proves to be the superior model since it delivers 93.8% accuracy and 96.2% ROC-AUC score at the cost of increased computational requirements. Random Forest and SVM exhibited comparable results, but Naive Bayes demonstrated the least effective performance with accuracy and recall statistics. BERT outperforms other models in cybersecurity, but its high computing requirements prevent it from real-time implementation.

 

Author Biographies

Mubasher Malik, University of southern Punjab

Professor Dr Mubasher Malik is Chairman of department of computer Science in University of southern Punjab Multan 

Zahid Aziz, Department of Computer Science, Emerson University, Multan, Pakistan

Zahid Aziz is lecturer in department of computer science, emerson university, multan 

Muhammad Kamran Abid, Department of Computer Science, Emerson University, Multan, Pakistan

Muhammad Kamran is lecturer in department of computer science, emerson university, multan 

Muhammad Sabir, University of southern Punjab

Assistant Professor in department of computer science 

Downloads

Published

2025-06-27

How to Cite

Khan, T., Mubasher Malik, Zahid Aziz, Muhammad Kamran Abid, & Muhammad Sabir. (2025). Cyber Threat and Vulnerability Classification Using NLP and Machine Learning Techniques on Text-Based Security Data. Journal of Computers and Intelligent Systems, 3(2), 139–152. Retrieved from https://journals.iub.edu.pk/index.php/JCIS/article/view/3815