Welcome to Bible TTS

Paper - Github

BibleTTS is the largest, highest-quality open Text-to-Speech dataset for any language, and immediately unlocks the development of high-quality Text-to-Speech models for ten languages spoken in Sub-Saharan Africa. With up to 80 hours of single speaker, studio quality 48kHz recordings per language, we release this data under a commercial-friendly Creative Commons license.

Corpus Statistics

The BibleTTS corpus consists of high-quality audio released as 48kHz, 24-bit, mono-channel FLAC files. Recordings for each language consist of a single speaker recorded under professional quality, close-microphone conditions (i.e., without background noise or echo). BibleTTS is rare among public speech corpora for the volume of data available per speaker and its suitability for creating TTS models. Furthermore, the corpus consists of ten languages which are under-represented in today’s voice technology landscape, both in academia and in industry.

	Unaligned Hours	Unaligned Samples	Aligned Hours	Aligned Samples
Ewe	100.1	1,167	86.8	24,957
Hausa	103.2	1,189	86.6	40,603
Kikuyu	90.6	1,189	--	--
Lingala	151.7	1,189	71.6	15,117
Luganda	110.4	1,189	--	--
Luo	80.4	1,189	--	--
Chichewa	115.9	1,162	--	--
Akuapem Twi	75.7	1,189	67.1	28,238
Asante Twi	82.6	1,189	74.9	29,021
Yoruba	93.6	1,189	33.3	10,228

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

Welcome to Bible TTS

Corpus Statistics

Files

index.md

Latest commit

History

index.md

File metadata and controls

Welcome to Bible TTS

Corpus Statistics