AI-Powered Public Health Educator

ABSTRACT

Globally, there is a shortage of physicians, nurses and other health professionals. In the medical field, doctors consult their medical reference manual, ―MSD Manual‖, which informs and guides the doctors on the causes of disease, the evaluation and treatment options. The manual has a version, meant for the usage of the consumers who are the patients, who may not be aware of or know how to access this knowledge. This knowledge is what this work aims to make accessible to the public. In our previous work, we built an AI assistant which was based on RAG-LLM model architecture. Retrieval Augmented Generation (RAG) introduces latency from retrieval, potential errors in document selection, and increased system complexity due to the need for a separate retrieval module. Also, LLMs are known to be limited by the size of their context windows, as this sometimes led to hallucinations. To solve this challenge, the tech industry has come up with LLMs having long context window tokens and a feature called Cache-Augmented Generation (CAG). Research direction and call is currently on comparing the CAG based systems with the standard RAG based approach, across different application domains. This present work adopts the Cache-Augmented Generation to extend our previous work done using RAG pipelines. This present work implements a CAG-based application and compares its performance to that of the RAG-based application, using same dataset. Experimental design approach was used. Data was scrapped from the website of the reference book, MSD Manual, used by physicians and pharmacists globally. The knowledge base was passed to the high context window tokens large language model (LLM) featuring the Cache Augmented Generation (CAG). The documents in the knowledge base are processed and stored in a key-value cache, which acts like a fast-access memory bank for the model. When a prompt is passed in by a user, the model looks into its cached knowledge, and the answer is generated directly from the stored context, making the process fast and more reliable. Evaluation of the performance of the CAG based large language model was done using BERTScore and BLEU score metrics. From experimental runs, BertScore ranges from 0.5 to 0.7 for the RAG system. The CAG system yields values of range 0.8 to 0.85. The CAG system yields a better higher BertScore than the RAG-based system. This means that the CAG system is not only able to capture the semantic meaning of sentences, it is good at assessing paraphrasing, coherence, and relevance, as well as generating long, more complex text than the RAG-based system. For the RAG, the BLEU Score ranges from 0.5 to 0.71, indicating the model is able to generate response at best, at 71% accuracy. For the CAG, the Bleu Score is of range 0.9 to 0.96, indicating the model generates response at 90 to 96% accuracy when compared to the reference text. In terms of latency, lower values were obtained for the CAG system when compared to the RAG system. It took 3.37 to 3.9 seconds for the CAG system to start generating a response compared to RAG, which generates a response from 187 to 201 seconds. The model when deployed in future, and made available for public usage, could save human lives to a reasonable degree. The resultant solution is an improved AI-powered solution not meant to replace medical professionals. Rather, it is to provide for the public, a rich source of health education and medical information.
Keywords – Generative artificial intelligence, Large language model, LLM, AI assistant, Conversational bot, Public health education, medical knowledge