Google Launches Open-Source Speech Database to Power AI in African Languages

Google has partnered with African universities and research institutions to launch WAXAL, an open-source speech database aimed at accelerating the development of voice-based artificial intelligence for African languages.

The initiative, developed in collaboration with institutions such as Makerere University in Uganda, the University of Ghana, Digital Umuganda in Rwanda, and the African Institute for Mathematical Sciences (AIMS), provides foundational speech data for 21 Sub-Saharan African languages, including Hausa, Yoruba, Luganda, Acholi, Igbo, Swahili and Fulani (Fula).

WAXAL is designed to support the creation of speech recognition systems, voice assistants, text-to-speech tools and other voice-enabled technologies across key sectors such as education, healthcare, agriculture and public services.

“This dataset provides the critical foundation for students, researchers, and entrepreneurs to build technology on their own terms, in their own languages,” said Aisha Walcott-Bryantt, Head of Google Research Africa.

The database contains over 11,000 hours of speech from nearly two million individual recordings, making it one of the largest open speech datasets focused on African languages. Developed over three years with funding and technical support from Google, WAXAL aims to address a long-standing gap in global AI development, where African languages remain significantly underrepresented.

Although Sub-Saharan Africa is home to more than 2,000 languages, fewer than 5% reportedly have the digital resources required for effective natural language processing (NLP), limiting the accuracy and usefulness of AI systems for African users.

WAXAL’s launch aligns with a broader continental push to localise artificial intelligence. In September 2025, the Nigerian government unveiled N-ATLAS, an open-source language model capable of recognising and generating speech in Yoruba, Hausa, Igbo and Nigerian-accented English. In the private sector, startups such as South Africa’s Lelapa AI are developing tools like Vulavula, which offers speech recognition, translation and sentiment analysis.

Under WAXAL’s partnership model, contributing institutions retain ownership of the data they collect, while making it openly available to researchers and developers worldwide.

“For AI to have a real impact in Africa, it must speak our languages and understand our contexts,” said Joyce Nakatumba-Nabende, Senior Lecturer at Makerere University’s School of Computing and Information Technology. “The WAXAL dataset gives our researchers the high-quality data they need to build speech technologies that reflect our unique communities.”

By making African language data openly accessible, WAXAL is expected to strengthen homegrown innovation and support a new generation of AI tools built for African realities, cultures, and communities.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *