ABSTRACT:
Marvelous development in Information Technology incorporating many languages has stimulated numerous research areas. Text generated with these languages is subject to analysis to maximize benefit, for intelligent decision making. Pre-processing is the first process in various text-based intelligent systems, and plays vital role in the system. The complexity of a text pre-processing is highly dependent on the natural language involved. This paper analyzed the structure of Igbo language and presented an effective pre-processing approach that converts the document to a format to be easily and effectively used by any text-based applications on the language. This is improved in such a way to reduce dimension space and time requirements needed to process applications with the language. The system is designed with Object-Oriented Methodology and implemented with Python programming language with tools from Natural Language Toolkits (NLTK). The result obtained after the experiment shows that the feature space dimensionality is significantly reduced, which implies the computation cost as well as the processing time will be reduced when used for further processing. This will ensure high performance when adopted by natural language applications.
Keywords: Igbo Language, Stop-Word Removal, Text Normalization, Text Pre-Processing, Tokenization