- Version
- Download 6
- File Size 624.77 KB
- File Count 1
- Create Date May 17, 2025
- Last Updated May 17, 2025
Design and Development of Automatic Speech Recognition (ASR) System for Low-resource Language Using Convolutional Neural Network Model
Abstract:
The advancement of Automatic Speech Recognition (ASR) systems for low-resource languages is a formidable challenge due to restricted linguistic data and computational resources. The Yorùbá language is among the oldest languages in Africa, characterized by a rich literary and grammatical heritage and it is among Low-resource Languages. In this study, convolutional neural network (CNN) model is proposed for ASR in one of the low-resource languages in sub-sahara Africa. This study aims to enhance ASR efficacy for low-resource languages through the utilization of CNN, recognized for their proficiency in extracting hierarchical features from audio input. CNN was calibrated to accurately recognize Yorùbá language speech with scarce annotated audio datasets of 16KB sample generated from 64 individuals who volunteer their speech under ethical considerations. The suggested approach incorporates advanced data augmentation methods to address data scarcity. Experiments are performed on two distinct thresholds of 0.22 and 0.35, illustrating the capability of CNN-based ASR systems to attain competitive accuracy. The calibration results reveal that the developed CNN indicate marginal increase in accuracy from 0.962 at thresholds of 0.22 to 0.9625 at thresholds of 0.35, showing reliable performance in classifying speech data. Character Error Rate (CER) and Word Error Rate (WER), which measure recognition errors at the word and character levels, showed steady improvement, with CER slightly increased from 0.5792 at thresholds of 0.22 to 0.5797 at thresholds of 0.35. Similarly, WER slightly decreased from 0.4589 at thresholds of 0.22 to 0.4578 at thresholds of 0.35. These metrics confirm the model's reliability in recognizing Yorùbá speech patterns for ASR. The findings demonstrate that the developed CNN model, when trained with enriched data, markedly improves speech recognition for underrepresented languages, hence advancing the accessibility and inclusivity of speech technology globally.
Keywords: Automatic speech recognition (ASR), Convolutional neural network (CNN) model, Speech signal processing, Low-resource language, Yorùbá language, Natural Language Processing (NLP).