This dataset solves a major gap: most speech AI is trained on English and a few European languages, leaving African languages behind. AfriVoices-KE provides the foundation needed to build fair, inclusive speech technology for Kenya.
AfriVoices-KE is a 3,000-hour multilingual speech dataset covering five Kenyan languages with recordings from nearly 5,000 native speakers. It combines scripted and spontaneous speech to enable building speech technology (like voice assistants and transcription tools) for underrepresented African languages.