Google Research Africa is shifting its AI strategy toward speech technology in a bid to address one of the continent’s biggest digital challenges: most of Africa’s 2,000-plus languages are spoken, not written, leaving them largely absent from AI training data.
To tackle this gap, Google has launched WAXAL, an open-source speech dataset developed over three years and officially released in February 2026. The dataset contains more than 11,000 hours of recorded speech from nearly 2 million audio clips across 21 Sub-Saharan African languages, including Hausa, Yoruba, Luganda and Acholi.
According to Abdoulaye Diack, a program manager at Google Research, voice is critical for AI adoption in Africa, where accents, intonation and code-switching often confuse existing systems trained primarily on Western text data.
The initiative was built through partnerships with African universities, including Makerere University and the University of Ghana. Local partners retain ownership of the data, which has been released as open source to encourage wider innovation.
By prioritising speech over text-based translation alone, Google aims to lay the groundwork for AI systems that can better understand and communicate with millions of Africans in their everyday languages.
Source: Techcabal