OVERSAMPLING-ADAPTIVE SEARCH OPTIMIZATION OF THE EXTREME GRADIENT BOOST MODEL ON AN IMBALANCED SPAM EMAIL DATASET

[featured_image]
Download
Download is available until [expire_date]
  • Version
  • Download 6
  • File Size 458.42 KB
  • File Count 1
  • Create Date November 4, 2023
  • Last Updated November 4, 2023

OVERSAMPLING-ADAPTIVE SEARCH OPTIMIZATION OF THE EXTREME GRADIENT BOOST MODEL ON AN IMBALANCED SPAM EMAIL DATASET

ABSTRACT

Imbalanced datasets present a challenge for machine learning models as they can lead to bias towards the majority class, resulting in poor performance for the minority class. In this study, four variants of the Extreme Gradient Boost (XGBoost) model were developed: XGBoost Baseline Model, XGBoost Random Search, XGBoost Adaptive Search + SMOTE, and XGBoost Adaptive Search + ROSE. Varying data augmentation and hyperparameter optimization techniques were applied to the models. The models were trained and evaluated on the enron1 spam email dataset. The performance evaluation revealed that oversampling in combination with adaptive search optimization improved the performance of the XGBoost model, achieving a sensitivity of 0.9569, F1 score of 0.9569, balanced accuracy of 0.9743 and Matthews Correlation Coefficient of 0.9485. This study demonstrates the effectiveness of oversampling-adaptive search optimization for improving the performance of XGBoost models on imbalanced datasets, particularly in the context of spam email classification.

Keywords: Adaptive Search, Imbalanced Dataset, Random Search, Spam Emails, XGBoost

SHARE

Leave a comment

Your email address will not be published. Required fields are marked *