- Version
- Download 9
- File Size 458.42 KB
- File Count 1
- Create Date November 4, 2023
- Last Updated November 4, 2023
OVERSAMPLING-ADAPTIVE SEARCH OPTIMIZATION OF THE EXTREME GRADIENT BOOST MODEL ON AN IMBALANCED SPAM EMAIL DATASET
ABSTRACT
Imbalanced datasets present a challenge for machine learning models as they can lead to bias towards the majority class, resulting in poor performance for the minority class. In this study, four variants of the Extreme Gradient Boost (XGBoost) model were developed: XGBoost Baseline Model, XGBoost Random Search, XGBoost Adaptive Search + SMOTE, and XGBoost Adaptive Search + ROSE. Varying data augmentation and hyperparameter optimization techniques were applied to the models. The models were trained and evaluated on the enron1 spam email dataset. The performance evaluation revealed that oversampling in combination with adaptive search optimization improved the performance of the XGBoost model, achieving a sensitivity of 0.9569, F1 score of 0.9569, balanced accuracy of 0.9743 and Matthews Correlation Coefficient of 0.9485. This study demonstrates the effectiveness of oversampling-adaptive search optimization for improving the performance of XGBoost models on imbalanced datasets, particularly in the context of spam email classification.
Keywords: Adaptive Search, Imbalanced Dataset, Random Search, Spam Emails, XGBoost