Comparative Study of Random Forest and Decision Tree Model for Loan Default Prediction

ABSTRACT

It remains an overwhelming task for financial institutions and banks to predict clients who will not redeem their loan. Several machine learning algorithms in literature have been applied to address the problem of misclassification in prediction models. This research applies classical machine learning algorithms; Random Forest (RF) and Decision Trees (DT) algorithms to predict loan defaulting clients, and evaluates which is more efficient in classifying new instances. These algorithms were able to classify transactions using the data set features such as credit score, customer’s age, salary earned, number of dependents, among other attributes. Both algorithms were trained using secondary dataset sourced from benchmark repository, Kaggle. The agile methodology was adopted to in the design and implementation of the system. The study analyzed the performance of the random forest and decision trees algorithms on a loan default dataset. The models were evaluated individually based on classification report: precision, recall, f1-score, and AUC score. From the resultant model, random forest outperformed decision trees with AUC_Score of 0.20. The comparative values of the random forest algorithm and the decision tree algorithm demonstrated that the random forest model suffers less from cases of misclassification.

Index Terms: Machine Learning, Decision Tree, Random Forest, Loan Default, Misclassification