Skip to content

Latest commit

 

History

History
131 lines (110 loc) · 14 KB

README.md

File metadata and controls

131 lines (110 loc) · 14 KB

Common Voice

This is a living document on all things related to the Common Voice project.

Feel free to make suggestions!

How to contribute to Numbers + Yes/No

If you want to help with the numbers + yes/no data collection project, you're in the right place! There's three ways to contribute:

  1. You don't have a Github account: send an email to me at [email protected] with (1) language name, (2) the words you're contributing {english_word: your_translation}, and (3) YES/NO are you a native speaker of the language.
  2. You don't want to send a Pull Request: Leave a Github Issue with the same info as above.
  3. Send a Pull Request (preferred): send your changes in the table as a PR.

Licensing

Common Voice is released under a Creative Commons-0 license.

Download

You can download the current release here: Common Voice Download

Current Release

Release Date

December 10, 2019

Language Statistics

LANGUAGE # HOURS # SPEAKERS LANGUAGE FAMILY
Abkhaz <1 hours (validated); <1 hours (total) 3 speakers (reported: 2% female / 98% male) Northwest Caucasian
Arabic 7 hours (validated); 12 hours (total) 228 speakers (reported: 24% female / 48% male) Afro-Asiatic
Basque 65 hours (validated); 99 hours (total) 638 speakers (reported: 23% female / 51% male) Language Isolate
Breton 5 hours (validated); 12 hours (total) 133 speakers (reported: 2% female / 55% male) Indo-European
Catalan 245 hours (validated); 295 hours (total) 3,724 speakers (reported: 35% female / 43% male) Indo-European
Cantonese (Hong Kong) <1 hours (validated); <1 hours (total) 15 speakers (reported: 24% female / 37% male) Sino-Tibetan
Chuvash <1 hour (validated); 2 hours (total) 38 speakers (reported: 0% female / 47% male) Turkic
Dhivehi 6 hours (validated); 8 hours (total) 101 speakers (reported: 64% female / 28% male) Indo-European
Dutch 24 hours (validated); 33 hours (total) 701 speakers (reported: 10% female / 66% male) Indo-European
English 1,118 hours (validated); 1,488 hours (total) 51,072 speakers (reported: 13% female / 46% male) Indo-European
Esperanto 35 hours (validated); 41 hours (total) 215 speakers (reported: 7% female / 70% male) Indo-European
Estonian 10 hours (validated); 13 hours (total) 230 speakers (reported: 38% female / 57% male) Uralic
French 350 hours (validated); 412 hours (total) 8,164 speakers (reported: 12% female / 65% male) Indo-European
German 483 hours (validated); 538 hours (total) 8,460 speakers (reported: 9% female / 67% male) Indo-European
Hakha Chin 2 hours (validated); 5 hours (total) 290 speakers (reported: 20% female / 23% male) Sino-Tibetan
Indonesian 3 hours (validated); 3 hours (total) 56 speakers (reported: 4% female / 82% male) Austronesian
Interlingua 1 hours (validated); 3 hours (total) 12 speakers (reported: 2% female / 94% male) Indo-European
Irish 2 hour (validated); 4 hour (total) 80 speakers (reported: 16% female / 59% male) Indo-European
Italian 85 hours (validated); 122 hours (total) 4,292 speakers (reported: 18% female / 47% male) Indo-European
Japanese 3 hours (validated); 3 hours (total) 52 speakers (reported: 0% female / 81% male) Japonic
Kabyle 262 hours (validated); 276 hours (total) 693 speakers (reported: 22% female / 55% male) Afro-Asiatic
Kinyarwanda <1 hours (validated); 17 hours (total) 129 speakers (reported: 8% female / 41% male) Niger-Congo
Kyrgyz 11 hours (validated); 21 hours (total) 119 speakers (reported: 44% female / 45% male) Turkic
Latvian 4 hours (validated); 6 hours (total) 86 speakers (reported: 17% female / 64% male) Indo-European
Mandarin (China) 26 hours (validated); 31 hours (total) 963 speakers (reported: 10% female / 64% male) Sino-Tibetan
Mandarin (Taiwan) 42 hours (validated); 60 hours (total) 1,108 speakers (reported: 26% female / 48% male) Sino-Tibetan
Mongolian 9 hours (validated); 12 hours (total) 296 speakers (reported: 25% female / 36% male) Mongolic
Odia 0.8 hours (validated); 1.2 hours (total) 9 speakers (reported: 13% female / 46% male) Indo-European
Persian 211 hours (validated); 255 hours (total) 2,763 speakers (reported: 6% female / 78% male) Indo-European
Portuguese 27 hours (validated); 29 hours (total) 354 speakers (reported: 2% female / 89% male) Indo-European
Romansh Sursilvan <1 hours (validated); <1 hours (total) 3 speakers (reported: 0% female / 75% male) Indo-European
Russian 72 hours (validated); 76 hours (total) 496 speakers (reported: 23% female / 71% male) Indo-European
Sakha 3 hours (validated); 6 hours (total) 37 speakers (reported: 10% female / 54% male) Turkic
Slovenian 3 hour (validated); 6 hours (total) 51 speakers (reported: 16% female / 80% male) Indo-European
Spanish 167 hours (validated); 221 hours (total) 8,252 speakers (reported: 10% female / 55% male) Indo-European
Swedish 5 hours (validated); 6 hours (total) 99 speakers (reported: 8% female / 74% male) Indo-European
Tamil 3 hours (validated); 4 hours (total) 91 speakers (reported: 10% female / 67% male) Dravidian
Tatar 25 hours (validated); 27 hours (total) 142 speakers (reported: 2% female / 81% male) Turkic
Turkish 13 hours (validated); 14 hours (total) 461 speakers (reported: 8% female / 74% male) Turkic
Votic <1 hours (validated); <1 hours (total) 2 speakers (reported: 0% female / 0% male) Uralic
Welsh 59 hours (validated); 77 hours (total) 1,149 speakers (reported: 18% female / 29% male) Indo-European

Single-digit numbers + yes + no

WARNING: these words, numbers, and spellings are not guaranteed to be correct.

Digits

How would these numbers be read if you were counting out loud or if you were reading a number out loud digit-by-digit?

How would the yes/no be said if you were answering a simple question?

LANGUAGE 0 1 2 3 4 5 6 7 8 9 yes no native speaker verified?
Abkhaz акымзарак акы ҩба хԥа ԥшьба хәба фба быжьба ааба жәба ааи мап YES
Arabic صفر واحد إثنان ثلاثة أربعة خمسة ستة سبعة ثمانية تسعة نعم لا YES
Basque zero bat bi hiru lau bost sei zazpi zortzi bederatzi bai ez YES
Breton mann unan daou tri pevar pemp c'hwec'h seizh eizh nav ya nann NO
Cantonese (Hong Kong) 唔係 YES
Catalan zero u dos tres quatre cinc sis set vuit nou no YES
Chuvash пӗрре иккӗ виҫҫӗ тӑваттӑ пиллӗк улттӑ ҫиччӗ саккӑр тӑххӑр вуннӑ ҫапла ҫук YES
Czech nula jedna dva tři čtyři pět šest sedm osm devět ano ne YES
Danish nul en to tre fire fem seks syv otte ni ja nej YES
Dhivehi އާ ނޫން NO
Dutch nul één twee drie vier vijf zes zeven acht negen ja nee YES
English zero one two three four five six seven eight nine yes no YES
Esperanto nul unu du tri kvar kvin ses sep ok naŭ jes ne YES
Estonian null üks kaks kolm neli viis kuus seitse kaheksa üheksa jah ei NO
French zéro un deux trois quatre cinq six sept huit neuf oui non YES
German null eins zwei drei vier fünf sechs sieben acht neun ja nein YES
Hakha Chin NO
Indonesian nol satu dua tiga empat lima enam tujuh delapan sembilan ya tidak YES
Interlingua NO
Irish a náid a haon a dó a trí a ceathair a cúig a sé a seacht a hocht a naoi NO
Italian zero uno due tre quattro cinque sei sette otto nove no NO
Japanese [Formal / Informal] れい / まる いち / ひと に / ふた さん / み し / よ ご / いつ ろく / む しち / なな はち / や く / ここの はい / うん いいえ / いや YES
Kabyle NO
Kinyarwanda zeru rimwe kabiri gatatu kane gautanu gatandatu umunane icyenda NO
Kyrgyz нөл бир эки үч төрт беш алты жети сегиз тогуз ооба жок NO
Latvian nulle viens divi trīs četri pieci seši septiņi astoņi deviņi NO
Mandarin (China) NO
Mandarin (Taiwan) YES
Mongolian тэг нэг нь хоёр гурав дөрөв тав зургаа долоо найм ес тийм шүү үгүй шүү NO
Odia ଶୂନ ଏକ ଦୁଇ ତିନି ଚାରି ପାଞ୍ଚ ଛଅ ସାତ ଆଠ ନଅ ହଁ ନା YES
Persian صفر یکی دو سه چهار پنج شش هفت هشت نه آره نه NO
Polish zero jeden dwa trzy cztery pięć sześć siedem osiem dziewięć tak nie YES
Portuguese zero um dois três quatro cinco seis[ptr-br also use "meia"] sete oito nove sim não YES
Romansh Sursilvan NO
Russian ноль один два три четыре пять шесть семь восемь девять да нет YES
Sakha NO
Slovenian nìč êna dvé trí štíri pét šést sédem ósem devét ja ne NO
Spanish cero uno dos tres cuatro cinco seis siete ocho nueve no YES
Swedish noll ett två tre fyra fem sex sju åtta nio ja nej NO
Tamil பூஜ்யம் ஒன்று இரண்டு மூன்று நான்கு ஐந்து ஆறு ஏழு எட்டு ஒன்பது ஆம் இல்லை YES
Tatar ноль бер ике өч дүрт биш алты җиде сигез тугыз әйе юк YES
Turkish sıfır bir iki üç dört beş altı yedi sekiz dokuz evet hayır YES
Votic NO
Welsh sero / dim un dau tri pedwar pump chwech saith wyth naw YES