tts_data/pron_data at master · cadia-lvl/tts_data

History

Name		Name	Last commit message	Last commit date
parent directory ..
diphones		diphones
prondicts		prondicts
README.md		README.md
bin_list_sorted_g2p.txt		bin_list_sorted_g2p.txt
ipd_clean_slt2018.mdl		ipd_clean_slt2018.mdl

README.md

all_tokens_norm_clean_g2p.txt (80417 tokens)

A tab seperated list where each line is like <token> \t <source> \t <phones> where the source is a rudementary identifier for the source of the token and phones is a tab-seperated list of word-phone-predictions, each of which is space seperated. An example:

Token	Source	Phones
Kettir verði ekki bundnir í árborg	malromur	cʰ ɛ h t ɪ r <\t> v ɛ r ð ɪ <\t> ɛ h c ɪ <\t> p ʏ n t n ɪ r <\t> iː <\t> au r p ɔ r k
Níu grunaðir um ölvunarakstur	malromur	n i j ʏ <\t> k r ʏː n a ð ɪ r <\t> ʏ m <\t> œ l v ʏ n a r a k s t ʏ r

Diphone coverage comparisons

Using the complete list of valid diphones in data/diphones/complete_ipa.txt, the tokens in all_tokens_norm_clean_g2p.txt cover approx 81% of the diphones. The ivona dataset alone (8909 tokens), approx 10x smaller than the complete list, covers approx 76 % of the listed valid diphones.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pron_data

pron_data

README.md

all_tokens_norm_clean_g2p.txt (80417 tokens)

Diphone coverage comparisons

Files

pron_data

Directory actions

More options

Directory actions

More options

Latest commit

History

pron_data

Folders and files

parent directory

README.md

all_tokens_norm_clean_g2p.txt (80417 tokens)

Diphone coverage comparisons