Retrieval-Augmented Machine Translation for Low-Resource Language Using Tree-Of-Thoughts Prompting

Abstract:

Large language models (LLMs) have proven to be effective in a lot of generative tasks including machine translation where a text is converted from the source language to the target language. A major limitation of LLMs is its tendency to suffer hallucinations and generate non-factual responses. To correct these limitations, retrieval-augmented generation (RAG) was proposed which has resulted in better model responses. In this study, we use RAG to augment the machine translation capabilities of LLMs with an emphasis on low-resource languages which usually suffer from a lack of properly curated data. The MENYO-20k English-Yoruba parallel corpus was utilized. Tree-of-thoughts prompting technique was used to gather coherent thoughts sequences to generate more natural feeling translations. The results obtained show great promise for a wider application of RAG to low-resource machine translation tasks. We achieved a BLEU score of 71.1, ChfF score of 80.8, and a TER score of 23.9. These results surpassed the results obtained in earlier research works that used zero-shot and few-shots
prompts. We expect more research in this area from organizations and government alike, especially in Africa – which has
numerous ethnic groups and languages. This is to help meet the public sector need for the effective communication of services
and policies to these numerous ethnic groups in a timelier and cost-effective manner.

Keywords— Low-resource languages, machine translation, large language models, retrieval-augmented generation, digital
inclusion, e-governance