Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Details on TTS evaluation? #42

Open
Btlmd opened this issue Sep 14, 2024 · 2 comments
Open

Details on TTS evaluation? #42

Btlmd opened this issue Sep 14, 2024 · 2 comments

Comments

@Btlmd
Copy link

Btlmd commented Sep 14, 2024

Hello! Thanks for your wonderful work. Trying to reproduce your results on the TTS task, I'm wondering if you could provide more details about the evaluation of the TTS task, especially:

  • How many / Which samples are used in VCTK dataset
  • Which ASR model is used to convert the generated speech into text
  • How the WER is calculated; What kind of text normalization is applied before the calculation

Thanks!

@JunZhan2000
Copy link
Collaborator

Please refer to Appendix C.

@jee019
Copy link

jee019 commented Oct 11, 2024

Hi, I have a few questions about zero-shot TTS evaluation using the VCTK dataset.

1. Evaluation Methodology:

We randomly select a 3-second clip from each speaker as the vocal prompt along with a separate text as input.

In the paper, particularly in Appendix C, the evaluation process seems a bit open to interpretation. Could you please provide a detailed description of how the evaluation was conducted?
Additionally, did you use all audio files from the VCTK dataset for the evaluation and cut off any entries longer than 3 seconds?

2. Dataset Usage:
I am curious about the specific version of the VCTK dataset used in your study. Did you only utilize audio files from the "mic2" recordings in the VCTK 0.92 version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants