A Cyber Persecution: Classification Using Ensemble Learning

Abstract:

Cyber persecution, which is popularly known as cyber harassment is one of the major crimes being committed on a daily basis in the cyber-world. Virtual Harassment or Harassment includes remarks made in chat rooms, the sending of rude or nasty emails, or even disturbing others by commenting on blogs or social networking sites. This paper classifies any form of harassment in the cyberspace with ensemble learning approach. This paper compares traditional classifiers and ensemble learning in classifying virtual harassment in online social media networks by training both models with four different datasets: seven machine learning algorithms (Nave Bayes NB, Decision Tree DT, K Nearest Neighbour KNN, Logistics Regression LR, Neural Network NN, Quadratic Discriminant Analysis QDA, and Support Vector Machine SVM) and four ensemble learning models (Ada Boosting, Gradient Boosting, Random Forest, and Max Voting). Finally, the study made a comparison of the results using twelve evaluation metrics, namely: Accuracy, Precision, Recall, F1-measure, Specificity, Matthew’s Correlation Coefficient (MCC), Cohen’s Kappa Coefficient KAPPA, Area Under Curve (AUC), False Discovery Rate (FDR), False Negative Rate (FNR), False Positive Rate (FPR), and Negative Predictive Value (NPV) were used to show the validity of the algorithms. At the end of the experiments, for Dataset 1, Logistics Regression had the highest accuracy of 0.6923 for machine learning algorithms; Max Voting Ensemble had the highest accuracy of 0.7047. With dataset 2, K-Nearest Neighbour, Support Vector Machine, and Logistics Regression all had the same highest accuracy of 0.8769 in the machine learning algorithm, while Random Forest and Gradient Boosting Ensemble both had the highest accuracy of 0.8779. For dataset 3, the Support Vector Machine had the highest accuracy of 0.9243 for the machine learning algorithms, while the Random Forest ensemble had the highest accuracy of 0.9258.

Keywords: Harassment, Machine learning algorithms, ensemble learning model, information security, cybersecurity