MULTI-STAGE PASSAGE RANKING AND ANSWER EXTRACTION MODEL FOR OPEN-DOMAIN COMPLEX FACTOID QUESTION ANSWERING SYSTEM

ABSTRACT

Question Answering System (QAS) is a fast growing area of research and commercial interest. Despite the advancements in QAS, there remain gaps in handling complex factoid questions effectively. The key challenges include Low accuracy in Passage raking, scoring and aggregation prediction whereby some top rank passages does not contain the best answer and incapable answering complex factoid questions due to the lack of reasoning ability. This research aimed to build an improved unified multi-stage deep learning model for open-domain complex factoid QAS which would be achieve by integrating lexical and semantic retrieval techniques to the model in order to minimize the return of some top rank passages that those not contain the best answer and boost the improved model with reasoning ability to be capable of answering complex factoid question. The new improve model harness the benefit of pre-trained deep-learning models instead of developing the model entirely from scratch. These models were fine-tuned on the target datasets such as SQuAD, HotpotQA, and WikiHop to leverage their existing capabilities and improve performance on specific tasks. Our new model achieved the highest performance, with an accuracy of 81.0% and F1 Score of 89.0%, outperforming the other existing models. The research has successfully demonstrated the potential of improving the multi-stage passage ranking and answer extraction model for enhancing open-domain complex factoid question answering.

Keywords: Question Answering System (QAS), Open-domain QAS, Complex Factoid Question, lexical retrieval, semantic retrieval, Path reasoning