COMPARATIVE ANALYSIS OF TEXT CATEGORIZATION ALGORITHMS

Version
Download 11
File Size 293.30 KB
File Count 1
Create Date October 10, 2020
Last Updated October 10, 2020
JCSA/V24N2/DECEMBER2017/11
or download free
[free_download_btn]

Description

ABSTRACT:

Text categorization (also known as text classification) is the task of automatically assigning documents to a category (or categories) from a pre-specified set. This task has several applications, including spam filtering, identification of document genre, automated indexing of scientific articles according to a predefined thesauri of technical terms, and even the automated extraction of metadata. The importance of text categorization cannot be overemphasized due to the fact that unstructured texts are the largest readily available source of data and manual organization of this data is infeasible due to the large number of documents involved as well as time constraints. The accuracy of modern text categorization machines rivals that of trained human professionals. This study experimentally compared four machine learning classifiers used in text categorization. These algorithms are; Naïve Bayes, Decision trees, k-Nearest Neighbour (kNN) and Support Vector Machines (SVM). These classifiers were developed using Python programming language. When run on the Reuters dataset, SVM significantly outperforms Naïve Bayes, kNN and Decision Trees. Decision trees performed worst of the four algorithms considered in this study. From observations made during the course of running these experiments, there seems to be a trade-off between simplicity and effectiveness. In conclusion, the results of this comparative analysis prove that SVM is the most effective of the classifiers considered in this study.

 

Keywords:

classifier, Decision trees, k-Nearest Neighbour (kNN), Machine learning ,Naïve Bayes, Support Vector Machines (SVM), text categorization, text classification

[changelog]

Categories & Tags

Similar Downloads

No related download found!
Nigeria Computer Society

SHARE