This is a living document on all things related to the Common Voice project.
Feel free to make suggestions!
If you want to help with the numbers + yes/no data collection project, you're in the right place! There's three ways to contribute:
- You don't have a Github account: send an email to me at [email protected] with (1) language name, (2) the words you're contributing
{english_word: your_translation}
, and (3)YES/NO
are you a native speaker of the language. - You don't want to send a Pull Request: Leave a Github Issue with the same info as above.
- Send a Pull Request (preferred): send your changes in the table as a PR.
Common Voice is released under a Creative Commons-0 license.
You can download the current release here: Common Voice Download
December 10, 2019
LANGUAGE | # HOURS | # SPEAKERS | LANGUAGE FAMILY |
---|---|---|---|
Abkhaz | <1 hours (validated); <1 hours (total) | 3 speakers (reported: 2% female / 98% male) | Northwest Caucasian |
Arabic | 7 hours (validated); 12 hours (total) | 228 speakers (reported: 24% female / 48% male) | Afro-Asiatic |
Basque | 65 hours (validated); 99 hours (total) | 638 speakers (reported: 23% female / 51% male) | Language Isolate |
Breton | 5 hours (validated); 12 hours (total) | 133 speakers (reported: 2% female / 55% male) | Indo-European |
Catalan | 245 hours (validated); 295 hours (total) | 3,724 speakers (reported: 35% female / 43% male) | Indo-European |
Cantonese (Hong Kong) | <1 hours (validated); <1 hours (total) | 15 speakers (reported: 24% female / 37% male) | Sino-Tibetan |
Chuvash | <1 hour (validated); 2 hours (total) | 38 speakers (reported: 0% female / 47% male) | Turkic |
Dhivehi | 6 hours (validated); 8 hours (total) | 101 speakers (reported: 64% female / 28% male) | Indo-European |
Dutch | 24 hours (validated); 33 hours (total) | 701 speakers (reported: 10% female / 66% male) | Indo-European |
English | 1,118 hours (validated); 1,488 hours (total) | 51,072 speakers (reported: 13% female / 46% male) | Indo-European |
Esperanto | 35 hours (validated); 41 hours (total) | 215 speakers (reported: 7% female / 70% male) | Indo-European |
Estonian | 10 hours (validated); 13 hours (total) | 230 speakers (reported: 38% female / 57% male) | Uralic |
French | 350 hours (validated); 412 hours (total) | 8,164 speakers (reported: 12% female / 65% male) | Indo-European |
German | 483 hours (validated); 538 hours (total) | 8,460 speakers (reported: 9% female / 67% male) | Indo-European |
Hakha Chin | 2 hours (validated); 5 hours (total) | 290 speakers (reported: 20% female / 23% male) | Sino-Tibetan |
Indonesian | 3 hours (validated); 3 hours (total) | 56 speakers (reported: 4% female / 82% male) | Austronesian |
Interlingua | 1 hours (validated); 3 hours (total) | 12 speakers (reported: 2% female / 94% male) | Indo-European |
Irish | 2 hour (validated); 4 hour (total) | 80 speakers (reported: 16% female / 59% male) | Indo-European |
Italian | 85 hours (validated); 122 hours (total) | 4,292 speakers (reported: 18% female / 47% male) | Indo-European |
Japanese | 3 hours (validated); 3 hours (total) | 52 speakers (reported: 0% female / 81% male) | Japonic |
Kabyle | 262 hours (validated); 276 hours (total) | 693 speakers (reported: 22% female / 55% male) | Afro-Asiatic |
Kinyarwanda | <1 hours (validated); 17 hours (total) | 129 speakers (reported: 8% female / 41% male) | Niger-Congo |
Kyrgyz | 11 hours (validated); 21 hours (total) | 119 speakers (reported: 44% female / 45% male) | Turkic |
Latvian | 4 hours (validated); 6 hours (total) | 86 speakers (reported: 17% female / 64% male) | Indo-European |
Mandarin (China) | 26 hours (validated); 31 hours (total) | 963 speakers (reported: 10% female / 64% male) | Sino-Tibetan |
Mandarin (Taiwan) | 42 hours (validated); 60 hours (total) | 1,108 speakers (reported: 26% female / 48% male) | Sino-Tibetan |
Mongolian | 9 hours (validated); 12 hours (total) | 296 speakers (reported: 25% female / 36% male) | Mongolic |
Odia | 0.8 hours (validated); 1.2 hours (total) | 9 speakers (reported: 13% female / 46% male) | Indo-European |
Persian | 211 hours (validated); 255 hours (total) | 2,763 speakers (reported: 6% female / 78% male) | Indo-European |
Portuguese | 27 hours (validated); 29 hours (total) | 354 speakers (reported: 2% female / 89% male) | Indo-European |
Romansh Sursilvan | <1 hours (validated); <1 hours (total) | 3 speakers (reported: 0% female / 75% male) | Indo-European |
Russian | 72 hours (validated); 76 hours (total) | 496 speakers (reported: 23% female / 71% male) | Indo-European |
Sakha | 3 hours (validated); 6 hours (total) | 37 speakers (reported: 10% female / 54% male) | Turkic |
Slovenian | 3 hour (validated); 6 hours (total) | 51 speakers (reported: 16% female / 80% male) | Indo-European |
Spanish | 167 hours (validated); 221 hours (total) | 8,252 speakers (reported: 10% female / 55% male) | Indo-European |
Swedish | 5 hours (validated); 6 hours (total) | 99 speakers (reported: 8% female / 74% male) | Indo-European |
Tamil | 3 hours (validated); 4 hours (total) | 91 speakers (reported: 10% female / 67% male) | Dravidian |
Tatar | 25 hours (validated); 27 hours (total) | 142 speakers (reported: 2% female / 81% male) | Turkic |
Turkish | 13 hours (validated); 14 hours (total) | 461 speakers (reported: 8% female / 74% male) | Turkic |
Votic | <1 hours (validated); <1 hours (total) | 2 speakers (reported: 0% female / 0% male) | Uralic |
Welsh | 59 hours (validated); 77 hours (total) | 1,149 speakers (reported: 18% female / 29% male) | Indo-European |
WARNING: these words, numbers, and spellings are not guaranteed to be correct.
How would these numbers be read if you were counting out loud or if you were reading a number out loud digit-by-digit?
How would the yes/no be said if you were answering a simple question?
LANGUAGE | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | yes | no | native speaker verified? |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Abkhaz | акымзарак | акы | ҩба | хԥа | ԥшьба | хәба | фба | быжьба | ааба | жәба | ааи | мап | YES |
Arabic | صفر | واحد | إثنان | ثلاثة | أربعة | خمسة | ستة | سبعة | ثمانية | تسعة | نعم | لا | YES |
Basque | zero | bat | bi | hiru | lau | bost | sei | zazpi | zortzi | bederatzi | bai | ez | YES |
Breton | mann | unan | daou | tri | pevar | pemp | c'hwec'h | seizh | eizh | nav | ya | nann | NO |
Cantonese (Hong Kong) | 零 | 一 | 二 | 三 | 四 | 五 | 六 | 七 | 八 | 九 | 係 | 唔係 | YES |
Catalan | zero | u | dos | tres | quatre | cinc | sis | set | vuit | nou | sí | no | YES |
Chuvash | пӗрре | иккӗ | виҫҫӗ | тӑваттӑ | пиллӗк | улттӑ | ҫиччӗ | саккӑр | тӑххӑр | вуннӑ | ҫапла | ҫук | YES |
Czech | nula | jedna | dva | tři | čtyři | pět | šest | sedm | osm | devět | ano | ne | YES |
Danish | nul | en | to | tre | fire | fem | seks | syv | otte | ni | ja | nej | YES |
Dhivehi | އާ | ނޫން | NO | ||||||||||
Dutch | nul | één | twee | drie | vier | vijf | zes | zeven | acht | negen | ja | nee | YES |
English | zero | one | two | three | four | five | six | seven | eight | nine | yes | no | YES |
Esperanto | nul | unu | du | tri | kvar | kvin | ses | sep | ok | naŭ | jes | ne | YES |
Estonian | null | üks | kaks | kolm | neli | viis | kuus | seitse | kaheksa | üheksa | jah | ei | NO |
French | zéro | un | deux | trois | quatre | cinq | six | sept | huit | neuf | oui | non | YES |
German | null | eins | zwei | drei | vier | fünf | sechs | sieben | acht | neun | ja | nein | YES |
Hakha Chin | NO | ||||||||||||
Indonesian | nol | satu | dua | tiga | empat | lima | enam | tujuh | delapan | sembilan | ya | tidak | YES |
Interlingua | NO | ||||||||||||
Irish | a náid | a haon | a dó | a trí | a ceathair | a cúig | a sé | a seacht | a hocht | a naoi | NO | ||
Italian | zero | uno | due | tre | quattro | cinque | sei | sette | otto | nove | sì | no | NO |
Japanese [Formal / Informal] | れい / まる | いち / ひと | に / ふた | さん / み | し / よ | ご / いつ | ろく / む | しち / なな | はち / や | く / ここの | はい / うん | いいえ / いや | YES |
Kabyle | NO | ||||||||||||
Kinyarwanda | zeru | rimwe | kabiri | gatatu | kane | gautanu | gatandatu | umunane | icyenda | NO | |||
Kyrgyz | нөл | бир | эки | үч | төрт | беш | алты | жети | сегиз | тогуз | ооба | жок | NO |
Latvian | nulle | viens | divi | trīs | četri | pieci | seši | septiņi | astoņi | deviņi | jā | nē | NO |
Mandarin (China) | 零 | 一 | 二 | 三 | 四 | 五 | 六 | 七 | 八 | 九 | 是 | 否 | NO |
Mandarin (Taiwan) | 零 | 一 | 二 | 三 | 四 | 五 | 六 | 七 | 八 | 九 | 是 | 否 | YES |
Mongolian | тэг | нэг нь | хоёр | гурав | дөрөв | тав | зургаа | долоо | найм | ес | тийм шүү | үгүй шүү | NO |
Odia | ଶୂନ | ଏକ | ଦୁଇ | ତିନି | ଚାରି | ପାଞ୍ଚ | ଛଅ | ସାତ | ଆଠ | ନଅ | ହଁ | ନା | YES |
Persian | صفر | یکی | دو | سه | چهار | پنج | شش | هفت | هشت | نه | آره | نه | NO |
Polish | zero | jeden | dwa | trzy | cztery | pięć | sześć | siedem | osiem | dziewięć | tak | nie | YES |
Portuguese | zero | um | dois | três | quatro | cinco | seis[ptr-br also use "meia"] | sete | oito | nove | sim | não | YES |
Romansh Sursilvan | NO | ||||||||||||
Russian | ноль | один | два | три | четыре | пять | шесть | семь | восемь | девять | да | нет | YES |
Sakha | NO | ||||||||||||
Slovenian | nìč | êna | dvé | trí | štíri | pét | šést | sédem | ósem | devét | ja | ne | NO |
Spanish | cero | uno | dos | tres | cuatro | cinco | seis | siete | ocho | nueve | sí | no | YES |
Swedish | noll | ett | två | tre | fyra | fem | sex | sju | åtta | nio | ja | nej | NO |
Tamil | பூஜ்யம் | ஒன்று | இரண்டு | மூன்று | நான்கு | ஐந்து | ஆறு | ஏழு | எட்டு | ஒன்பது | ஆம் | இல்லை | YES |
Tatar | ноль | бер | ике | өч | дүрт | биш | алты | җиде | сигез | тугыз | әйе | юк | YES |
Turkish | sıfır | bir | iki | üç | dört | beş | altı | yedi | sekiz | dokuz | evet | hayır | YES |
Votic | NO | ||||||||||||
Welsh | sero / dim | un | dau | tri | pedwar | pump | chwech | saith | wyth | naw | YES |