multipa

MultIPA is yet another automatic speech transcription model into phonetic IPA. The idea is that, if we train a multilingual speech-to-IPA model with enough amount of good phoneme representations, the model's output will be approximated to phonetic transcriptions. Please check out the Paper for details.

Available training languages

At this moment, we have the following languages incorporated available in the training data:

Finnish
Hungarian
Japanese
Maltese
Modern Greek
Polish
Tamil

We aim to include more languages to take into account linguistic diversity.

How to run

First, run pip install -r requirements.txt for required packages if you need.

You need to convert the transcription in the CommonVoice dataset into IPA before training a model. To do so, run preprocess.py; for example,

python preprocess.py \
       -l ja pl mt hu fi el ta \
       --num_proc 48

Then, run main.py to train a model. For example:

python3 main.py \
        -l ja pl mt hu fi el ta \
        -tr 1000 1000 1000 1000 1000 1000 1000 \
        -te 200 200 200 200 200 200 200 \
        -qf False False False False False False False \
        -a True \
        -s "japlmthufielta-nq-ns" \
        -ns True \
        -v vocab.json \
        -e 10

for training with 7 languages, 1000 training samples and 200 validation samples for each, where audio samples with bad quality are not filtered out, additional data from Forvo are included, the suffix for the output model folder name is japlmthufielta-nq-ns, orthographic spaces are removed, the name of the vocab file is vocab.json, and the number of epochs is set to 10.

Model

You can run the model (trained on 1k samples for each language, 9h in total) here.

Notes

If you are using AFS, preprocess.py might cause OS Error: File too large due to reaching the limit of the number of files that a directory can accommodate.
Additional data from Forvo themselves are not uploaded in this repository.
The full list of IPA symbols was obtained from the Panphon library.

Citation

Chihiro Taguchi, Yusuke Sakai, Parisa Haghani, David Chiang. "Universal Automatic Phonetic Transcription into the International Phonetic Alphabet". INTERSPEECH 2023.
For the time being, you may cite our arXiv paper:

@misc{taguchi2023universal,
      title={Universal Automatic Phonetic Transcription into the International Phonetic Alphabet}, 
      author={Chihiro Taguchi and Yusuke Sakai and Parisa Haghani and David Chiang},
      year={2023},
      eprint={2308.03917},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contact

Feel free to raise issues if you find any bugs. Also, feel free to contact me ctaguchi at nd.edu for collaboration.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
converter		converter
README.md		README.md
add_forvo.py		add_forvo.py
data_utils.py		data_utils.py
full_vocab_ipa.txt		full_vocab_ipa.txt
main.py		main.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
test_data.csv		test_data.csv
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

multipa

Available training languages

How to run

Model

Notes

Citation

Contact

About

Releases

Packages

Languages

ctaguchi/multipa

Folders and files

Latest commit

History

Repository files navigation

multipa

Available training languages

How to run

Model

Notes

Citation

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages