IMPROVING PERFORMANCE OF HEART DISEASE PREDICTION WITH BAYESIAN NETWORK MODEL WITH FEATURE SUBSET SELECTION

ABSTRACT

Cardiovascular disease is one of the foremost causes of death in the world. It also causes severe morbidity and disability in people. The usage of electronic health record (EHR) systems has increased the amount of healthcare data available for analysis and forecasting. The need to make accurate predictions of heart disease through the use of machine learning algorithms is a result of many factors the human mind cannot process. Several machine learning methods, including Random Forest, Logistic Regression, Artificial Neural Network (ANN), K-Nearest Neighbor, and Support Vector Machine (SVM), have been applied to Cleveland heart datasets however, not very much was done on modeling with a Bayesian Network (BN). This study used the widely used Cleveland heart data collected from the UCI repository. Different feature reduction techniques were used and Bayesian Networks is used to predict the reduced dataset. The result of different train-test ratios is also evaluated against a 10-fold cross-validation. The results show that using feature reduction approaches on the 70:30 train test split improves the classifier's prediction performance. The research approach had 89% accuracy.

Keywords— Machine Learning, Bayesian Network (BN), Naïve Bayes, Heart Disease, Prediction