ABSTRACT:
Current state of the art spellchecking techniques are based on an efficiently stored list of correct spellings of words in a language against which wrongly spelt words are checked. However, Nigerian Pidgin does not have a compiled list of such proofed spellings which is required by these techniques. As a result, people generally prepare writings in Nigerian Pidgin using different spelling styles, leading to inconsistency each time a word is spelt. To solve this problem which also holds for many other resource-scarce languages, this paper presents a machine learning approach to spellchecking that does not require an existing word list. In this approach, the correct spelling of a word is learnt based on the relative frequencies of various renditions of the spelling of the word in a document. That is, the technique flags spelling errors by depending only on words within the document that is being edited.
Keywords:
Edit distance, Orthography standardisation, Spellchecking, Unigram Probabilities.