Our Publications

    The advancement of speech technology has predominantly favored high-resource languages, creating a significant digital divide for speakers of most Sub-Saharan African languages. To address this gap, we introduce WAXAL, a large-scale, openly accessible speech dataset for 21 languages representing over 100 million speakers. The collection consists of two main components: an Automated Speech Recognition (ASR) dataset containing approximately 1,250 hours of transcribed, natural speech from a diverse range of speakers, and a Text-to-Speech (TTS) dataset with over 180 hours of high-quality, single-speaker recordings reading phonetically balanced scripts. This paper details our methodology for data collection, annotation, and quality control, which involved partnerships with four African academic and community organizations. We provide a detailed statistical overview of the dataset and discuss its potential limitations and ethical considerations. The WAXAL datasets are released at this URL under the permissive CC-BY-4.0 license to catalyze research, enable the development of inclusive technologies, and serve as a vital resource for the digital preservation of these languages
    Large language models (LLMs) have demonstrated strong performance in medical contexts; however, existing benchmarks often fail to reflect the real-world complexity of low-resource health systems accurately. Here we develop a dataset of 5,609 clinical questions contributed by 101 community health workers across 4 Rwandan districts and compared responses generated by 5 LLMs (Gemini-2, GPT-4o, o3-mini, Deepseek R1 and Meditron-70B) with those from local clinicians. A subset of 524 question–answer pairs was evaluated using a rubric of 11 expert-rated metrics, scored on a 5-point Likert scale. Gemini-2 and GPT-4o were the best performers (achieving mean scores of 4.49 and 4.48 out of 5, respectively, across all 11 metrics). All LLMs significantly outperformed local clinicians (P < 0.001) across all metrics, with Gemini-2, for example, surpassing local general practitioners by an average of 0.83 points on every metric (range 0.38–1.10). Although performance degraded slightly when LLMs communicated in Kinyarwanda, the LLMs remained superior to clinicians and were over 500 times cheaper per response. These findings support the potential of LLMs to strengthen frontline care quality in low-resource, multilingual health systems.
    To date, there exist almost no culturally-specific evaluation benchmarks for large language models (LLMs) that cover a large number of languages and cultures. In this paper, we present Global PIQA, a participatory commonsense reasoning benchmark for over 100 languages, constructed by hand by 335 researchers from 65 countries around the world. The 116 language varieties in Global PIQA cover five continents, 14 language families, and 23 writing systems. In the non-parallel split of Global PIQA, over 50% of examples reference local foods, customs, traditions, or other culturally-specific elements. We find that state-of-the-art LLMs perform well on Global PIQA in aggregate, but they exhibit weaker performance in lower-resource languages (up to a 37% accuracy gap, despite random chance at 50%). Open models generally perform worse than proprietary models. Global PIQA highlights that in many languages and cultures, everyday knowledge remains an area for improvement, alongside more widely-discussed capabilities such as complex reasoning and expert knowledge. Beyond its uses for LLM evaluation, we hope that Global PIQA provides a glimpse into the wide diversity of cultures in which human language is embedded.
    5Workshop on Large Language Models and Generative AI for Health at AAAI 2025
    Mbaza RBC: Deploying and evaluating an LLM-powered Chatbot for Community Health Workers in Rwanda
    The emergence of Large Language Models (LLMs) offers an opportunity to support health systems, particularly in low and middle income countries such as Rwanda where there exists limited health infrastructure. By providing information and support to front-line workers, especially community health workers (CHWs), LLMs offer to improve the quality of care by providing quick access to medical guidelines, supporting clinical decision-making, and facilitating health education in local languages. This work deploy and evaluates the performance of Large Language Model (LLM)-based chatbots to assist Community Health Workers (CHWs) in Rwanda, focusing on usability, interaction modalities, and local language processing. A total of 3,000 questions generated by Front-line workers using text and voice input methods were analyzed to determine preferences and error rates. Results indicate a strong preference for text-based queries (66%), though voice queries showed high satisfaction (97.5%) with minor transcription errors (2.47%). The most common focus areas for CHW queries were Maternal and Newborn Health, Integrated Community Case Management, and Nutrition. These findings suggest that, while voice interactions hold some potential, improvements in speech-to-text models are needed for optimal functionality in low-resource settings.
    The field of text-to-speech (TTS) technology has been rapidly advancing in recent years, and has become an increasingly important aspect of our lives. This presents an opportunity for Africa, especially in facilitating access to information to many vulnerable socio-economic groups. However, the lack of availability of high-quality datasets is a major hindrance. In this work, we create a dataset based on recordings of the Bible. Using an existing Kinyarwanda speech-to-text model we were able to segment and align the speech and the text, and then created a multi-speaker Kinyarwanda TTS model.
    This paper presents a multilingual Automatic Speech Recognition (ASR) model for three East African languages—Kinyarwanda, Swahili, and Luganda. The Common Voice project's African languages datasets were used to produce a curated code-switched dataset of 3,900 hours on which the ASR model was trained. The work included validating the Kinyarwanda dataset and developing a model that achieves a 17.57 Word Error Rate (WER) on the language. Across all three languages, the Kinyarwanda model was finetuned and achieved a WER of 21.91 on the three curated datasets, with a WER of 25.48 for Kinyarwanda, 17.22 for Swahili, and 21.95 for Luganda. The paper emphasizes the necessity of considering the African environment when developing effective ASR systems and the significance of supporting many languages when developing ASR for languages with limited resources.
    In most real-time scenarios such as emergency first response or a patient self-monitoring using a wearable device, likely, accessing a healthcare physician for assessing potential vital sign anomalies and providing a recommendation will be impossible; thus potentially putting the patient at risk. Leveraging the latest advances in Natural Language Processing (NLP), this paper presents a research-driven design and development of a cloud-based conversational AI platform trained to predict vital signs anomalies and provides recommendations from a dataset created by physicians. To reinforce the learning of the virtual assistant, the Conversation Driven Development (CDD) methodology has been adopted to involve end users in the testing process in the early phase. The proposed platform will help to manage the consequences of low physician-patient ratios, especially in developing countries.

    Publications mentioning our work

    In most real-time scenarios such as emergency first response or a patient self-monitoring using a wearable device, likely, accessing a healthcare physician for assessing potential vital sign anomalies and providing a recommendation will be impossible; thus potentially putting the patient at risk. Leveraging the latest advances in Natural Language Processing (NLP), this paper presents a research-driven design and development of a cloud-based conversational AI platform trained to predict vital signs anomalies and provides recommendations from a dataset created by physicians. To reinforce the learning of the virtual assistant, the Conversation Driven Development (CDD) methodology has been adopted to involve end users in the testing process in the early phase. The proposed platform will help to manage the consequences of low physician-patient ratios, especially in developing countries.
    Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expanding ASR coverage has been costly and limited by architectures that restrict language support, making extension inaccessible to most--all while entangled with ethical concerns when pursued without community collaboration. To transcend these limitations, we introduce Omnilingual ASR, the first large-scale ASR system designed for extensibility. Omnilingual ASR enables communities to introduce unserved languages with only a handful of data samples. It scales self-supervised pre-training to 7B parameters to learn robust speech representations and introduces an encoder-decoder architecture designed for zero-shot generalization, leveraging a LLM-inspired decoder. This capability is grounded in a massive and diverse training corpus; by combining breadth of coverage with linguistic variety, the model learns representations robust enough to adapt to unseen languages. Incorporating public resources with community-sourced recordings gathered through compensated local partnerships, Omnilingual ASR expands coverage to over 1,600 languages, the largest such effort to date--including over 500 never before served by ASR. Automatic evaluations show substantial gains over prior systems, especially in low-resource conditions, and strong generalization. We release Omnilingual ASR as a family of models, from 300M variants for low-power devices to 7B for maximum accuracy. We reflect on the ethical considerations shaping this design and conclude by discussing its societal impact. In particular, we highlight how open-sourcing models and tools can lower barriers for researchers and communities, inviting new forms of participation. Open-source artifacts are available at this URL.
    logo

    Let’s bridge the digital gap in Africa

    Contact Us

    info@digitalumuganda.com

    +250795756094

    Kigali Heights 6th Floor