- Version
- Download 25
- File Size 775.86 KB
- File Count 1
- Create Date June 25, 2024
- Last Updated June 25, 2024
Comparative Analysis of the Effectiveness of Sampling Approaches for Decision Tree Classifier
Abstract:
A common issue that arises in a wide range of real-world applications is classification of imbalanced datasets. This is a case where a class typically called the minority class makes up a small portion of the total instances in an imbalanced dataset, yet misclassification of instances in that class is costly. Sampling is one of the approaches used to handle the misclassification of instances of the minority class. This study seeks to carry out an analysis of various sampling techniques such as Random under sampling method (RUS), Random over-sampling method (ROS) and Synthetic minority oversampling technique (SMOTE) and a sampling approach embedded in a variant of Decision Tree Classifier (ID3) called Minority Entropy Decision Tree (MEDT). All these sampling approaches were applied on the imbalance datasets before classifying the datasets with decision tree classifiers. The result of the analysis shows the sampling approach proposed by MEDT performed better than the other sampling approaches with a sensitivity result of 0.53 for thoracic dataset and 0.958 for cerebral stroke dataset. This analysis shows that the sampling approach embedded within MEDT is recommended for handling misclassification of minority class instances in imbalance datasets classification.
Keywords: Imbalanced dataset, decision trees, sampling, classification